720 Empregos para Reliability engineer - São Paulo
Reliability Engineer
Trabalho visualizado
Descrição Do Trabalho
Select how often (in days) to receive an alert:
Reliability EngineerDate: 13 Aug 2025
Company: Alstom
At Alstom, we understand transport networks and what moves people. From high-speed trains, metros, monorails, and trams, to turnkey systems, services, infrastructure, signalling, and digital mobility, we offer our diverse customers the broadest portfolio in the industry. Every day, 80,000 colleagues lead the way to greener and smarter mobility worldwide, connecting cities as we reduce carbon and replace cars.
Could you be the Reliability Engineer in São Paulo we’re looking for?
Your future role
Take on a new challenge and apply your **Reliability, Availability, and Maintainability (RAM)** expertise in a dynamic and innovative field. You’ll work alongside **collaborative and forward-thinking** teammates.
You'll play a key role in ensuring the reliability of our products, projects, and tenders, contributing to the optimization of performance and cost-effectiveness. Day-to-day, you’ll work closely with cross-functional teams across the business (engineering, maintenance, and product development), perform technical reliability analyses, and support continuous improvement initiatives, and much more.
You’ll specifically take care of **RAM calculations, predictions, and verifications**, but also **develop maintenance plans and recommendations** to meet performance targets.
We’ll look to you for:
Managing RAM activities on projects, tenders, and product development, ensuring effective communication with teams.
Performing technical RAM analyses using established methods, standards, and guidelines.
Carrying out reliability calculations, statistical estimations, and producing detailed plans and reports.
Making design, maintenance, and operational recommendations to meet RAM targets.
Defending technical choices with internal teams and customers.
Implementing RAM monitoring processes and issuing key performance indicators (KPIs).
Supporting continuous improvement through the creation of integrated systems.
All about you
We value passion and attitude over experience. That’s why we don’t expect you to have every single skill. Instead, we’ve listed some that we think will help you succeed and grow in this role:
Bachelor’s degree in Electrical Engineering, Mechanical Engineering, Electronics Engineering, Mechatronics Engineering, or Automation Engineering.
Experience or understanding of electrical/electronic hardware, schematics, analog and digital electronics, and probability/statistical concepts.
Knowledge of troubleshooting at the board level, failure analysis, and root cause diagnosis.
Familiarity with RAMS techniques, reliability calculations, and statistical tools (e.g., ReliaSoft BlockSim, Weibull+, or similar).
A technical certification or equivalent experience in reliability engineering is a plus.
Proficiency in MS-Office Suite, especially advanced skills in MS-Word and MS-Excel.
Strong analytical problem-solving skills and the ability to work in a fast-paced environment.
Things you’ll enjoy
Join us on a life-long transformative journey – the rail industry is here to stay, so you can grow and develop new skills and experiences throughout your career. You’ll also:
Enjoy stability, challenges, and a long-term career
Work with cutting-edge reliability engineering techniques and tools.
Collaborate with transverse teams and helpful colleagues.
Contribute to innovative projects that shape the future of mobility.
Utilise our **flexible and inclusive** working environment.
Steer your career in whatever direction you choose across functions and countries.
Benefit from our investment in your development, through award-winning learning.
Progress towards leadership roles in RAM engineering or other areas of interest.
Benefit from a fair and dynamic reward package that recognises your performance and potential, plus comprehensive and competitive social coverage (life, medical, pension).
You don’t need to be a train enthusiast to thrive with us. We guarantee that when you step onto one of our trains with your friends or family, you’ll be proud. If you’re up for the challenge, we’d love to hear from you!
Important to note
As a global business, we’re an equal-opportunity employer that celebrates diversity across the 63 countries we operate in. We’re committed to creating an inclusive workplace for everyone.
#J-18808-LjbffrEmprego já não disponível
Esta posição já não está listada no WhatJobs. O empregador pode estar a analisar as candidaturas, preencheu a vaga ou removeu a listagem.
No entanto, temos empregos semelhantes disponíveis para si abaixo.
Data Reliability Engineer
Hoje
Trabalho visualizado
Descrição Do Trabalho
Join to apply for the Data Reliability Engineer role at TELUS Digital Brazil
1 week ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
OverviewWelcome to TELUS Digital , where innovation meets impact. As an award-winning digital product consultancy, we're shaping the future of digital experiences through cutting-edge technology, agile thinking, and a culture that puts people first. We are the global digital section of TELUS, one of Canada’s largest telecommunications providers. Our global teams deliver transformative digital solutions and customer experiences for industry leaders in consumer electronics, finance, telecommunications, and utilities.
Location and flexibility
This role can be fully remote for candidates based in the states of São Paulo and Rio Grande do Sul as well as in the cities of Rio de Janeiro, and Belo Horizonte , due to team distribution and occasional in-person opportunities. If you are based in São Paulo or Porto Alegre, you are welcome to work from one of our offices on a flexible schedule.
Qualifications- 5+ years of hands-on experience in supporting data engineering teams, strongly emphasizing data pipeline enhancement and optimization, and data integration.
- Proficient in cloud computing, preferably Google Cloud Platform (GCP), but AWS and Azure are also valid.
- Experience with cloud data-related services such as BigQuery, Dataflow, Cloud Composer, Dataproc, Cloud Storage, Pub/Sub, or the correlated services from other providers.
- Solid proficiency with Python in terms of data processing.
- Knowledge of SQL and experience with relational databases.
- Proven experience optimizing data pipelines toward efficiency, reducing operational costs, and reducing the number of issues/failures.
- Solid knowledge of monitoring, troubleshooting, and resolving data pipeline issues.
- Familiarity with version control systems like Git.
- Strong English communication and documentation skills.
- Design and implement scalable data pipeline architectures in collaboration with Data Engineers.
- Continuously optimize data pipeline efficiency to reduce operational costs and minimize issues and failures.
- Monitor performance and reliability of data pipelines, enhancing reliability through data quality, analysis, and testing.
- Build and manage automated alerting systems for data pipeline issues.
- Automate repetitive tasks in data processing and management.
- Develop and manage disaster recovery and backup plans.
- In collaboration with other Data Engineering teams, conduct capacity planning for data storage and processing needs.
- Develop and maintain comprehensive documentation for data pipeline systems and processes, and provide knowledge transfer to data-related teams.
- Monitor, troubleshoot and resolve production issues in data processing workflows.
- Maintain infrastructure reliability for data pipelines, enterprise datahub, HPBI, and MDM systems.
- Conduct post-incident reviews and implement improvements for data pipelines.
At TELUS Digital, you’ll work with world-class brands like FOX, HBO, PepsiCo, and Domino's, building transformative digital products that impact millions. Our global reach allows you to collaborate with diverse, international teams, solving complex problems and delivering tech-driven solutions that matter.
We thrive on engineering excellence, using the latest technologies in cloud computing, AI, machine learning, DevOps, microservices architecture, and data engineering. Our teams embrace Agile methodologies, CI/CD pipelines, and a DevOps-first mindset to deliver solutions at scale.
In addition to being part of an international and innovative consultancy company, you will have:
- A Global Innovation Hub: Be part of an international consultancy at the forefront of technology
- Work-Life Harmony: Enjoy flexible hours and autonomy to balance your professional and personal life
- Cutting-Edge Tech Playground: Dive into the latest technologies and shape the future of digital solutions
- Prestigious Partnerships: Collaborate with world-renowned brands, making a real impact in the market
- Growth-Centric Environment: Thrive in our collaborative ecosystem with a clear career development path
- Global Exposure: Embrace optional international travel opportunities to broaden your horizons
At TELUS Digital, we are proud to be an equal opportunity employer and are committed to creating a diverse and inclusive workplace. We are committed to building an inclusive team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Therefore we provide equal employment opportunities to all employees and applicants regardless of race, color, religion, gender identity, sexual orientation, national origin, age, or disability.
We will only use the information you provide to process your application and to produce tracking statistics. Since we do not request personal data deemed sensitive, we ask you to abstain from sharing that information with us.
For more information on how we use your information, see our Privacy Policy.
Seniority level: Mid-Senior level
Employment type: Full-time
Job function: Engineering and Information Technology
Industries: Software Development
#J-18808-LjbffrSite Reliability Engineer
Hoje
Trabalho visualizado
Descrição Do Trabalho
Description
Summary :
We’re looking for an experienced Platform/Infrastructure Engineer with a strong Microsoft Azure background and deep knowledge of Kubernetes. You'll play a key role in designing, deploying, and maintaining infrastructure and services that power our products. This role requires hands-on experience with automation, modern IaC practices, CI/CD, and maintaining production-grade environments.
The Role:
- Operate, monitor, and improve cloud infrastructure for high-availability services in Azure
- Deploy, configure and manage Kubernetes workloads at scale, including the use of Helm, ArgoCD, Flux, or similar GitOps tools
- Build and maintain CI/CD pipelines using Azure DevOps or similar tooling
- Write and maintain Infrastructure as Code using Terraform or OpenTofu
- Develop scripts and automation to support infrastructure and deployment workflows - PowerShell is preferred
- Collaborate with engineering teams to support platform reliability and enable delivery
- Maintain visibility and awareness through monitoring and logging tools such as Datadog, Azure Monitor, App Insights etc.
- Support incident resolution and participate in an on-call rota to help maintain service uptime
The Requirements:
Essential Experience:
- Proven experience in a Platform, Infrastructure, or DevOps engineering role
- Hands-on experience operating 24x7 services in a public cloud, ideally Azure
- Strong experience managing infrastructure using Terraform or OpenTofu
- Experience managing and scaling Kubernetes clusters in production environments
- Proficient with CI/CD tooling, preferably Azure DevOps (YAML pipelines)
- Strong scripting skills using PowerShell
- Experience with monitoring and logging solutions such as Azure Monitor, App Insights, or similar
- Clear communicator with the ability to collaborate across cross-functional teams
Nice to Have:
- Azure certifications (e.g. Azure Administrator, Azure DevOps Engineer)
- Experience with GitOps and tools such as ArgoCD or Flux
- Familiarity with Configuration as Code tools like Ansible or Puppet
- Exposure to large-scale distributed systems or high-volume web APIs
- Awareness of incident response processes and platform reliability best practices
Equal Opportunity Employer
At WTW, we believe difference makes us stronger. We want our workforce to reflect the different and varied markets we operate in and to build a culture of inclusivity that makes colleagues feel welcome, valued and empowered to bring their whole selves to work every day. We are an equal opportunity employer committed to fostering an inclusive work environment throughout our organisation. We embrace all types of diversity.
At WTW, we trust you to know your work and the people, tools and environment you need to be successful. The majority of our colleagues work in a ”hybrid” style, with a mix of remote, in-person and in-office interactions dependent on the needs of the team, role and clients. Our flexibility is rooted in trust and “hybrid” is not a one-size-fits-all solution.
#J-18808-LjbffrSite Reliability Engineer
Publicado há 3 dias atrás
Trabalho visualizado
Descrição Do Trabalho
#J-18808-Ljbffr
Site Reliability Engineer
Publicado há 3 dias atrás
Trabalho visualizado
Descrição Do Trabalho
About CloudWalk:
We are not just another fintech unicorn. We are a pack of dreamers, makers, and tech enthusiasts building the future of payments. With millions of happy customers and a hunger for innovation, we're now expanding our neural network - literally and metaphorically.
The Site Reliability Engineering (SRE) team aims to maximize the engineering velocity of developer teams while keeping products reliable. Working with us you will be responsible for the maintenance of sandbox and staging environments and the automation pipeline to ensure continuous testing.
What You'll Be Doing:- Help to develop and spread the DevOps culture (we love )
- Create and maintain development sandbox environments
- Automate and orchestrate workloads in cloud environments
- Assist in the configuration, use, and management of test versions and test data
- Integrate automated tests in the delivery pipeline
- Horizontally interact with other SRE and Quality Engineers throughout CloudWalk's engineering team
- Experience with cloud environments (GCP, AWS)
- Solid knowledge of Relational Databases, SQL, and ORM technologies
- Experience with CI tools
- Experience with containers technologies and orchestrators
- A high bar for quality
- Soft skills to master communication and collaboration throughout multiple teams
Join us at CloudWalk, where we’re not just engineering solutions; we’re building a smarter, AI-driven future for payments—together.
By applying for this position, your data will be processed as per CloudWalk's Privacy Policy that you can readhere in Portuguese andhere in English.
#J-18808-LjbffrData Reliability Engineer
Publicado há 4 dias atrás
Trabalho visualizado
Descrição Do Trabalho
Welcome to TELUS Digital, where innovation meets impact. As an award-winning digital product consultancy, we're shaping the future of digital experiences through cutting-edge technology, agile thinking, and a culture that puts people first.
We are the global digital section of TELUS, one of Canada’s largest telecommunications providers. Our global teams deliver transformative digital solutions and customer experiences for industry leaders in consumer electronics, finance, telecommunications, and utilities. With robust multi-shore delivery capabilities, multi-language programs, and secure infrastructure, we ensure exceptional service backed by our multi-billion-dollar parent company.
Location and FlexibilityThis role can be fully remote for candidates based in the states of São Paulo and Rio Grande do Sul as well as in the cities of Rio de Janeiro, and Belo Horizonte , due to team distribution and occasional in-person opportunities. If you are based in São Paulo or Porto Alegre, you are welcome to work from one of our offices on a flexible schedule.
Qualifications- 5+ years of hands-on experience in supporting data engineering teams, strongly emphasizing data pipeline enhancement and optimization, and data integration.
- Proficient in cloud computing, preferably Google Cloud Platform (GCP), but AWS and Azure are also valid.
- Experience with cloud data-related services such as BigQuery, Dataflow, Cloud Composer, Dataproc, Cloud Storage, Pub/Sub, or the correlated services from other providers.
- Solid proficiency with Python in terms of data processing.
- Knowledge of SQL and experience with relational databases.
- Proven experience optimizing data pipelines toward efficiency, reducing operational costs, and reducing the number of issues/failures.
- Solid knowledge of monitoring, troubleshooting, and resolving data pipeline issues.
- Familiarity with version control systems like Git.
- Strong English communication and documentation skills.
- Design and implement scalable data pipeline architectures in collaboration with Data Engineers.
- Continuously optimize data pipeline efficiency to reduce operational costs and minimize issues and failures.
- Monitor performance and reliability of data pipelines, enhancing reliability through data quality, analysis, and testing.
- Build and manage automated alerting systems for data pipeline issues.
- Automate repetitive tasks in data processing and management.
- Develop and manage disaster recovery and backup plans.
- In collaboration with other Data Engineering teams, conduct capacity planning for data storage and processing needs.
- Develop and maintain comprehensive documentation for data pipeline systems and processes, and provide knowledge transfer to data-related teams.
- Monitor, troubleshoot and resolve production issues in data processing workflows.
- Maintain infrastructure reliability for data pipelines, enterprise datahub, HPBI, and MDM systems.
- Conduct post-incident reviews and implement improvements for data pipelines.
At TELUS Digital, you’ll work with world-class brands like FOX, HBO, PepsiCo, and Domino's, building transformative digital products that impact millions. Our global reach allows you to collaborate with diverse, international teams, solving complex problems and delivering tech-driven solutions that matter.
We thrive on engineering excellence, using the latest technologies in cloud computing, AI, machine learning, DevOps, microservices architecture, and data engineering. Our teams embrace Agile methodologies, continuous integration and deployment (CI/CD) pipelines, and a DevOps-first mindset to deliver solutions at scale.
In addition to being part of an international and innovative consultancy company, you will have:
- A Global Innovation Hub: Be part of an international consultancy at the forefront of technology
- Work-Life Harmony: Enjoy flexible hours and autonomy to balance your professional and personal life
- Cutting-Edge Tech Playground: Dive into the latest technologies and shape the future of digital solutions
- Prestigious Partnerships: Collaborate with world-renowned brands, making a real impact in the market
- Growth-Centric Environment: Thrive in our collaborative ecosystem with a clear career development path
- Global Exposure: Embrace optional international travel opportunities to broaden your horizons
Some of our benefits:
- Health and dental plan
- Life insurance
- Monthly voucher for meals, culture, education, health and mobility
- Child care assistance and more!
At TELUS Digital, we are proud to be an equal opportunity employer and are committed to creating a diverse and inclusive workplace. We are committed to building an inclusive team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Therefore we provide equal employment opportunities to all employees and applicants regardless of race, color, religion, gender identity, sexual orientation, national origin, age, or disability.
We will only use the information you provide to process your application and to produce tracking statistics. Since we do not request personal data deemed sensitive, we ask you to abstain from sharing that Information with us.
For more information on how we use your information, see our Privacy Policy.
Create a Job Alert
Interested in building your career at TELUS Digital Brazil? Get future opportunities sent straight to your email.
Apply for this job*
First Name *
Last Name *
Email *
Phone *
Resume/CV *
Enter manually
Accepted file types: pdf, doc, docx, txt, rtf
LinkedIn URL (Please list N/A if you don't have one) *
How did you hear about us? * Select.
Where did you hear about us? * Select.
If you were referred by a team member, please provide their name.
Are you legally authorized to work in Brazil? * Select.
Please select your English language proficiency. We ask that you submit your resume and any other application materials in English. The interview process will be conducted in English. * Select.
(Brazil) Voluntary Demographic Questions Voluntary Self-IdentificationWillowTree is committed to fostering a diverse, inclusive, and equitable workplace. To help us measure the effectiveness of our outreach and recruitment programs, we invite you to voluntarily self-identify in the following areas. Please note that completing these questions is optional and will have no impact on hiring decisions.
The information you provide will be kept confidential and used in aggregate for reporting and compliance purposes, ensuring that you cannot be identified individually.
Gender Identification * Select.
Disability Status (Disability is a long-term physical, mental, intellectual, or sensory impairment. Examples include restricted mobility, blindness, deafness, speech impairment, learning and attention issues, and/or post-traumatic stress disorder). * Select.
#J-18808-LjbffrSite Reliability Engineer
Publicado há 5 dias atrás
Trabalho visualizado
Descrição Do Trabalho
• Garantir a disponibilidade, resiliência e escalabilidade dos serviços em produção.
• Criar e manter monitoramento, logging, tracing e alertas inteligentes para sistemas críticos.
• Desenvolver e manter pipelines CI/CD para entregas ágeis e seguras.
• Implementar Infraestrutura como Código (IaC) usando Terraform, Ansible ou CloudFormation.
• Atuar com Kubernetes e orquestração de containers para ambientes distribuídos.
• Definir e acompanhar SLIs, SLOs e SLAs junto aos times de engenharia.
• Liderar análises de incidentes e post-mortems, propondo melhorias contínuas.
• Trabalhar com segurança, governança e compliance em ambientes cloud.
*Requisitos desejáveis*
• Experiência comprovada como SRE, DevOps ou Engenheiro(a) de Infraestrutura.
• Domínio em cloud computing (AWS, GCP ou Azure).
• Forte experiência com Kubernetes e Docker.
• Conhecimento avançado em observabilidade (Prometheus, Grafana, Datadog, New Relic, etc.).
• Conhecimento em linguagens de automação (Python, Go, Shell Script).
• Prática com SRE principles: SLIs, SLOs, SLAs e Error Budgets.
*Diferenciais*
• Certificações cloud (AWS Solutions Architect, GCP Professional Cloud Engineer, Azure Expert).
• Experiência em migração para nuvem e modernização de aplicações.
• Conhecimento de arquiteturas de microsserviços.
Não encontrou uma vaga compatível? Cadastre-se em nosso Banco de Talentos! Banco de Talento - Vendas (Se inscreva, temos Kovi em várias regiões do Brasil) Banco de Talentos - Software Engineer Spec I e II (Júnior e Pleno)São Paulo, São Paulo, Brazil 51 minutes ago
São Paulo, São Paulo, Brazil 15 hours ago
São Bernardo do Campo, São Paulo, Brazil 2 days ago
Engenheiro de Projetos - Sistemas de PMS #J-18808-LjbffrSite Reliability Engineer
Publicado há 8 dias atrás
Trabalho visualizado
Descrição Do Trabalho
Personetics is shaping the Cognitive Banking era, harnessing AI to help banks anticipate customer needs, provide actionable insights, and deliver intelligent financial guidance. Our platform continuously analyzes and leverages real-time transactional data, enabling banks to proactively support customers in managing their finances and reaching their goals. As industry leaders—yes, we really are leaders—we partner with the world’s top financial institutions, empowering over 150 million customers monthly across 35 global markets from offices in New York, London, Singapore, São Paulo, and Tel Aviv.
About the positionWe are seeking a Site Reliability Engineer to join our Cloud Operations team in Brazil. In this role, you’ll help design, deploy, and maintain reliable, scalable cloud solutions, support customer onboarding, troubleshoot production issues, and optimize system performance. This is a great opportunity to grow your skills while working with modern cloud, container, and automation technologies in a global, fast-paced environment.
Responsibilities- Install, integrate, and operate end-to-end solutions and features, from design to production.
- Manage production systems and oversee CI/CD pipelines.
- Support customers during onboarding, including connecting and integrating their data into our system.
- Research, diagnose, troubleshoot, and resolve recurring environment issues.
- Participate in the on-call rotation and serve as an escalation point for incidents.
- Contribute to service design and architecture to proactively prevent system failures.
- 2-5 years of experience in Application Integration, SRE, or Production Operations.
- Bachelor’s degree in computer science, Software Engineering, or a related field
- Hands-on experience with:
- Linux and Docker
- Kubernetes on AKS or other container orchestration tools
- Terraform or similar IaC tools; experience with GitOps
- CI/CD solutions, preferably Jenkins
- Networking, including configuring WAF rules, IP whitelisting, and troubleshooting
- Strong problem-solving skills with the ability to prioritize effectively.
- High level of proficiency in English, both written and spoken.
- Experience with Maven and Nexus or similar registry solutions
- Familiarity with Git version control systems
- Knowledge of databases such as MySQL and PostgreSQL
- Scripting skills in Python, Bash, or Groovy
Seja o primeiro a saber
Sobre o mais recente Reliability engineer Empregos em São Paulo !
Site Reliability Engineer
Publicado há 8 dias atrás
Trabalho visualizado
Descrição Do Trabalho
Personetics is shaping the Cognitive Banking era, harnessing AI to help banks anticipate customer needs, provide actionable insights, and deliver intelligent financial guidance. Our platform continuously analyzes and leverages real-time transactional data, enabling banks to proactively support customers in managing their finances and reaching their goals. As industry leaders—yes, we really are leaders—we partner with the world’s top financial institutions, empowering over 150 million customers monthly across 35 global markets from offices in New York, London, Singapore, São Paulo, and Tel Aviv.
About the positionWe are seeking a Site Reliability Engineer to join our Cloud Operations team in Brazil. In this role, you’ll help design, deploy, and maintain reliable, scalable cloud solutions, support customer onboarding, troubleshoot production issues, and optimize system performance. This is a great opportunity to grow your skills while working with modern cloud, container, and automation technologies in a global, fast-paced environment.
Responsibilities- Install, integrate, and operate end-to-end solutions and features, from design to production.
- Manage production systems and oversee CI/CD pipelines.
- Support customers during onboarding, including connecting and integrating their data into our system.
- Research, diagnose, troubleshoot, and resolve recurring environment issues.
- Participate in the on-call rotation and serve as an escalation point for incidents.
- Contribute to service design and architecture to proactively prevent system failures.
- 2-5 years of experience in Application Integration, SRE, or Production Operations.
- Bachelor’s degree in computer science, Software Engineering, or a related field
- Hands-on experience with:
- Linux and Docker
- Kubernetes on AKS or other container orchestration tools
- Terraform or similar IaC tools; experience with GitOps
- CI/CD solutions, preferably Jenkins
- Networking, including configuring WAF rules, IP whitelisting, and troubleshooting
- Strong problem-solving skills with the ability to prioritize effectively.
- High level of proficiency in English, both written and spoken.
- Experience with Maven and Nexus or similar registry solutions
- Familiarity with Git version control systems
- Knowledge of databases such as MySQL and PostgreSQL
- Scripting skills in Python, Bash, or Groovy
Site Reliability Engineer
Publicado há 9 dias atrás
Trabalho visualizado
Descrição Do Trabalho
Personetics is shaping the Cognitive Banking era, harnessing AI to help banks anticipate customer needs, provide actionable insights, and deliver intelligent financial guidance. Our platform continuously analyzes and leverages real-time transactional data, enabling banks to proactively support customers in managing their finances and reaching their goals. As industry leaders—yes, we really are leaders—we partner with the world’s top financial institutions, empowering over 150 million customers monthly across 35 global markets from offices in New York, London, Singapore, São Paulo, and Tel Aviv.
About the positionWe are seeking a Site Reliability Engineer to join our Cloud Operations team in Brazil. In this role, you’ll help design, deploy, and maintain reliable, scalable cloud solutions, support customer onboarding, troubleshoot production issues, and optimize system performance. This is a great opportunity to grow your skills while working with modern cloud, container, and automation technologies in a global, fast-paced environment.
Responsibilities- Install, integrate, and operate end-to-end solutions and features, from design to production.
- Manage production systems and oversee CI/CD pipelines.
- Support customers during onboarding, including connecting and integrating their data into our system.
- Research, diagnose, troubleshoot, and resolve recurring environment issues.
- Participate in the on-call rotation and serve as an escalation point for incidents.
- Contribute to service design and architecture to proactively prevent system failures.
- 2-5 years of experience in Application Integration, SRE, or Production Operations.
- Bachelor’s degree in computer science, Software Engineering, or a related field
- Hands-on experience with:
- Linux and Docker
- Kubernetes on AKS or other container orchestration tools
- Terraform or similar IaC tools; experience with GitOps
- CI/CD solutions, preferably Jenkins
- Networking, including configuring WAF rules, IP whitelisting, and troubleshooting
- Strong problem-solving skills with the ability to prioritize effectively.
- High level of proficiency in English, both written and spoken.
- Experience with Maven and Nexus or similar registry solutions
- Familiarity with Git version control systems
- Knowledge of databases such as MySQL and PostgreSQL
- Scripting skills in Python, Bash, or Groovy
- Customer-facing experience.
Fields marked with * are mandatory.
First name *
Last name *
Email *
Phone *
Resume * Attach Resume
LinkedIn Profile URL
Attach Cover Letter
Attach Portfolio
Personal note
I agree that you can keep my data for an extended time period so that it will be easier for you to contact me about job opportunities.
#J-18808-LjbffrSite Reliability Engineer
Publicado há 22 dias atrás
Trabalho visualizado
Descrição Do Trabalho
2 weeks ago Be among the first 25 applicants
Get AI-powered advice on this job and more exclusive features.
Our US based client is looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering their secure, mission-critical SaaS platform.
You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to respond to incidents quickly, support ongoing automation, and scale systems reliably.
Responsibilities
- Be part of the team that owns the uptime and performance of our core backend infrastructure (Windows + Linux)
- Maintain and enhance observability across systems using Kibana, CloudWatch, and custom telemetry
- Manage CI/CD pipelines, infrastructure as code (Terraform, Ansible), and deployment automation
- Support and maintain production Windows environments:
- .NET Framework/Core apps running in IIS
- SQL Server with AlwaysOn replication and Service Broker-based messaging
- Support and operate cloud-native services:
- AWS Lambdas, DynamoDB, Postgres/Aurora, Redshift, Redis, and containerized workloads in Docker
- Participate in on-call rotation and incident response
- Collaborate closely with engineering teams to improve system reliability and deployment workflows
- 5+ years of SRE, DevOps, or WebOps experience supporting production SaaS systems
- Strong experience with Windows Server, IIS, and .NET applications in production
- Hands-on experience with SQL Server administration, including AlwaysOn and Service Broker
- Proficiency in AWS operations, including Lambda, DynamoDB, CloudWatch, and IAM
- Familiarity with Postgres, Redis, Kibana/ElasticSearch, and centralized logging
- Experience with Docker, Terraform, and Ansible for infrastructure management
- Strong scripting skills (PowerShell, Python)
- Experience running and debugging containerized and distributed systems in production
- Excellent incident response and debugging skills
Salary: $6,000 USD/month + Holidays
Unlimited PTO Seniority level
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Other
- Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at Sur LATAM by 2x
Sign in to set job alerts for “Site Reliability Engineer” roles. Site Reliability Engineer Pleno – SRE (Remoto) DevOps Engineer Career Opportunities at Dev.Pro - 01 Site Reliability Engineer (SRE) - Technical Referent Software Engineer (Node.js) Career Opportunities at Dev.Pro - 01 Site Reliability Engineer (Middle) ID38916 Software Engineer (C++) Career Opportunities at Dev.Pro - 01 Site Reliability Engineer - Remote Work | REF# Software Development Engineer in Test (Windows) Intermediate Software Engineer (React.js, Node.js) - OP01587-OS Software Development Engineer in Test (MacOS) Senior Software Engineer (Python) - OP01837 Junior Software Development Engineer in Test / R+D - Remote Work | REF#We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr