2.448 Empregos para Reliability Engineer - Brasil

Reliability Engineer

Belém, Pará Kerry

Ontem

Toque novamente para fechar

Descrição Do Trabalho

About Kerry

Kerry is the world's leading taste and nutrition company for the food, beverage and pharmaceutical industries. Every day we partner with customers to create healthier, tastier and more sustainable products that are consumed by billions of people across the world. Our vision is to be our customers' most valued partner, creating a world of sustainable nutrition. A career with Kerry offers you an opportunity to shape the future of food while providing you opportunities to explore and grow in a truly global environment.

About the role

This position leads the site’s Preventative Maintenance and Mechanical Integrity functions, with a strong focus on building and sustaining a robust Reliability-Centered Maintenance (RCM) program. Key responsibilities include enhancing inspection procedures, continuously reviewing and updating the critical equipment list, and providing training to the maintenance team as needed.

The role works in close partnership with the Maintenance Manager to define, plan, and schedule all maintenance activities—ranging from daily tasks and one-day outages to preventive maintenance and major shutdowns.

Key responsibilities

Lead Failure Investigations : Conduct thorough investigations into equipment failures using methodologies such as Failure Modes and Effects Analysis (FMEA) and Root Cause Analysis (RCA). Develop and implement corrective actions to eliminate root causes and prevent recurrence.
Optimize Preventive Maintenance Programs : Design and continuously refine preventive maintenance strategies based on equipment criticality and historical failure data. Prioritize tasks to minimize operational impact and maximize asset reliability and lifespan.
Implement Predictive Maintenance Technologies : Deploy and manage predictive maintenance tools such as vibration analysis, infrared thermography, oil analysis, and ultrasonic testing. Leverage data insights to proactively schedule maintenance and avoid unplanned downtime.
Analyze Equipment Performance : Monitor and interpret equipment performance metrics to identify trends, inefficiencies, and potential risks. Use statistical tools and reliability indicators to drive data-informed decisions and continuous improvement.
Foster Cross-Functional Collaboration : Partner with operations, maintenance, and engineering teams to resolve reliability challenges, share best practices, and support plant-wide improvement initiatives. Provide technical leadership to align efforts with maintenance and reliability goals.
Train and Mentor Maintenance Personnel : Provide training and mentorship on reliability best practices, preventive and predictive maintenance techniques, and effective troubleshooting. Promote a proactive maintenance culture across the organization.
Maintain Documentation and Reporting : Keep detailed records of maintenance activities, equipment performance, and reliability metrics. Prepare and present reports to management, highlighting key findings, progress on initiatives, and recommendations for improvement.

Qualifications and skills

Bachelor’s degree in Engineering or completion of a technical school training program.
3–5 years of experience as a Reliability Engineer or in a similar role within a manufacturing environment, preferably in the Food & Beverage industry.
1–3 years of maintenance and supervisory experience preferred.
CMRP (Certified Maintenance & Reliability Professional) certification preferred.
Strong troubleshooting and problem-solving skills.
Experience with TPM (Total Productive Maintenance) and/or Lean Manufacturing initiatives.
Proven experience developing and managing predictive and preventive maintenance programs.
Proficiency with computer applications, including SAP, CMMS, and other business software.
Solid understanding of networked systems and PC-based tools used in maintenance and reliability operations.

Compensation

The typical hiring range for this role is $75,602 to $123,432 annually and is based on several factors including but not limited to education, work experience, certifications, location, etc. Kerry offers benefits such as a comprehensive benefits package, incentive and recognition programs, equity stock purchase and retirement contribution (all benefits and incentives are subject to eligibility requirements).

Equal Employment Opportunity

Kerry is an equal opportunity employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected by law. Kerry will only employ those who are legally authorized to work in the United States for this opening. Any offer of employment is conditional upon the successful completion of a background investigation and drug screen. Additional information can be found at: Know Your Rights: Workplace Discrimination is Illegal (dol.gov).

Job details

Seniority level: Mid-Senior level
Employment type: Full-time
Industries: Food and Beverage Manufacturing
Location: Allentown, PA (also Bethlehem, PA as listed in posting)

#J-18808-Ljbffr

Desculpe, este trabalho não está disponível em sua região

Reliability Engineer

Flinks

Ontem

Toque novamente para fechar

Descrição Do Trabalho

Flinks is where financial data moves—with purpose, trust, and impact.

We’re on a mission to simplify access to financial data and help businesses build better, faster, and more secure financial products and experiences. Since 2016, we’ve been bridging the gap between fintechs, financial institutions, and consumers by enabling seamless, secure data connectivity.

From instant account funding to smarter lending, our solutions help power some of the most innovative financial products in North America. We partner with lenders, banks, and fintechs to streamline onboarding, prevent fraud, and fuel real-time decision-making with enriched, reliable data.

As pioneers in Canada’s open banking movement, we’re not waiting for the future—we’re building it. If you’re bold, curious, and ready to help shape the future of finance, we’d love to meet you.

About the Reliability Team

As a Reliability Engineer, you will play a pivotal role in ensuring the stability, performance, and reliability of Flinks Fintech product platforms, and monitoring & alerting systems. You will serve as an expert in both software development and system support, working closely with engineering, operations, and product teams to troubleshoot complex issues, resolve incidents, and continuously improve the technical foundation of our products. This role demands a combination of advanced coding skills, incident management experience, and an understanding of the fin-tech industry.

What You’ll Do

Develop and maintain code to quickly resolve product issues, ensuring fast recovery and long-term system stability
Provide live operational support across multiple client applications, monitoring services and alerts to detect and resolve critical failures with minimal downtime
Own and troubleshoot complex incidents, conduct root cause analyses, and implement long-term solutions—adhering to SLAs and internal SLOs
Build monitoring dashboards and alerting systems to proactively detect and address issues, supporting system scalability and stability
Analyze operational metrics and KPIs to identify trends, surface client pain points, and drive improvements
Automate tooling and processes to improve efficiency and reduce manual work across LiveOps
Collaborate with cross-functional teams to deliver lasting fixes for production issues and contribute to technical analyses of product gaps
Lead and mentor reliability engineers, providing guidance and ensuring consistent delivery of high-quality work
Participate in post-incident reviews, documenting outcomes and driving preventative action items
Support after-hours on-call coverage as part of the LiveOps rotation

Qualifications

5+ years of experience with .NET Framework (C#), ensuring production system stability
Strong coding, debugging, and troubleshooting skills, particularly in performance optimization of large-scale applications
Operationally focused with expertise in incident management and resolving live production issues
Proven experience in building and maintaining reliable monitoring and alerting systems in high-demand environments, with a focus on production support
Strong knowledge of Kubernetes, Docker, and cloud platforms (GCP preferred)
Proficiency with monitoring tools like Prometheus, Grafana, and Kibana
Experience with incident ticketing/documentation tools like FreshDesk and Confluence
Critical thinker who can identify system weaknesses and find innovative solutions
Strong project management skills with a focus on scalability and system stability

Nice to haves

ITIL Service Management certification (or equivalent) is highly desired, such as ITIL v3, ITIL v4, or other equivalent certifications
Experience with PowerBI, web scraping, or Golang

The Interview Process

Head of People Ops
Case Assignment & Presentation
Director Interview

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology
Industries
Technology, Information and Internet

Referrals increase your chances of interviewing at Flinks by 2x

#J-18808-Ljbffr

Desculpe, este trabalho não está disponível em sua região

Reliability Engineer

Laguna, Santa Catarina Philippine Geothermal Production Company, Inc.

Publicado há 10 dias atrás

Toque novamente para fechar

Descrição Do Trabalho

Philippine Geothermal Production Company, Inc. (PGPC) is a Filipino corporation operating the Tiwi geothermal steam field in the province of Albay and the Mak-Ban geothermal steam field in the provinces of Batangas and Laguna. It is owned by the SM Investments Corporation.

Tiwi and Mak-Ban are the result of a successful partnership between Philippine Geothermal’s predecessor and the National Power Corporation that began in 1971, when a 2.5 kW government experiment was transformed into the first commercial geothermal power project in Southeast Asia, and led to the birth of the geothermal industry in the Philippines.

Philippine Geothermal continues its legacy of providing a clean, stable, reliable, and renewable source of energy to meet the country’s growing power requirements. Its vision is to be the leading geothermal energy company, recognized not only for its world-class performance but also for contributing to the improvement in the lives of the people in the communities where it operates.

The successful candidate will provide engineering support to the Assets in ensuring the reliability, operability, availability, and maintainability of the steam fields’ equipment, process/systems, and people, in order to help in optimizing generation, minimizing costs and achieving performance objectives.

The Role

Implements Reliability and Integrity Management Process (RIM) programs
Leads the review of maintenance philosophies in collaboration with Operations and Maintenance (O&M) personnel on existing and new equipment through Failure Modes & Effects Analysis (FMEA) and/or Reliability-Centered Maintenance (RCM) philosophies and recommends Preventive/Predictive Maintenance (PM/PdM) techniques and other improvements, as necessary
Provides reliability data interpretation, analysis, and reporting
Develops and calculates reliability metric as basis for reliability improvements/programs
Coordinates with the Quality Assurance (QA) Group in implementing inspection/reliability programs
Supports asset expense and capital projects to ensure quality, timeliness, cost effectiveness
Ensures compliance with the OE requirements, engineering and industry codes and standards, QA/QC programs and government regulations in the conduct of activities
Coaches and mentors other engineers to develop their skills and competencies

The Individual

Bachelor’s degree in Engineering preferably in Mechanical, Chemical or Industrial
With at least 5 years of working experience in Reliability and/or Maintenance Engineering
Preferably with experience in facilitating Reliability-Centered Maintenance (RCM) and/or Failure Modes & Effect Analysis (FMEA) workshops
Well-versed in engineering codes, industry standards and practices
Experience in Process Engineering and Project Engineering an advantage

If you are encountering difficulties submitting your application through this website, kindly send your resume and filled out application form directly to .

#J-18808-Ljbffr

Desculpe, este trabalho não está disponível em sua região

Site Reliability Engineer

ITeam

Hoje

Toque novamente para fechar

Descrição Do Trabalho

Sobre a Empresa

Com mais de 20 anos de mercado, a ITeam se destaca pelo comprometimento com o cliente. Baseamos nosso relacionamento em valores sólidos e objetivos claros, oferecendo soluções e serviços de TI que auxiliam na realização das metas dos nossos clientes. Nossa missão é fornecer serviços de TI que se alinhem com a estratégia e processos dos nossos clientes, sempre a partir de um capital humano qualificado.

SRE Sênior

Espanhol Fluente

Atuará no período noturno 21h ~ 6h horário Brasil

Remoto

Sobre o Papel

O profissional será responsável por suporte e resolução de chamados de 2o e 3o níveis, além de acompanhar ciclos de faturamento e fluxos diários de arrecadação e cobrança.

Responsabilidades

Suporte e resolução de chamados de 2o e 3o níveis. (análise e direcionamento de causa raiz);
Acompanhamento dos ciclos de faturamento (Billing) para garantir entregas (emissão de faturas, entregas fiscais e contábeis etc.);
Acompanhamento dos fluxos diários de arrecadação e cobrança para garantir entregas ao negócio.
Ajuda o tech lead/Liderança a resolver problemas de confiabilidade e prioriza nas atividades do projeto, dado os desafios de negócio e das necessidades da solução.
É proativo ao pedir feedbacks, escuta e evolui continuamente.
É autodidata, aprende coisas novas com regularidade por iniciativa própria.
Se atenta a o que outros projetos já fizeram e traz experiências passadas para o projeto atual, visando minimizar erros.
Se adapta rapidamente frente às mudanças do projeto como novas tarefas, repriorização, apoios técnicos.
Manter a qualidade das soluções desenvolvidas independente da complexidade da tarefa ou processo a ser melhorado.
Tem o "radar" ligado, se preocupa com riscos, premissas e se mobiliza para alcançar os objetivos traçados com o time.
Alta capacidade de fazer acontecer assuntos complexos, dada sua mobilização, criatividade e experiências passadas.
Mantém-se focado em tornar os produtos confiáveis.
Mapeamento do estado atual para identificar possíveis melhorias e tornar a plataforma mais resiliente.

Qualificações

Experiência em Sistema Operacional Linux (ex: Debian, Red Hat, etc) modo texto.

Habilidades Necessárias

Criação de scripts em Shell Script ou Powershell
Automatização em Terraform | CloudFormation | Pulumi *
GIT
Saber fazer CI/CD
Experiência com Jenkins ou Gitlab
Experiência com Docker
Noções de Kubernetes
Conhecimento de Cloud Platforms: AWS | AZURE ou GCP
Experiência ter trabalhado em times ágeis
Ter experiência em estimar prazos e participar planning backlog;
Saber desenvolver soluções com docker e docker-compose para microserviços, APIs, etc.
Automações são eficientes e possuem certo grau de escalabilidade quando necessário (adaptabilidade, performance e confiabilidade).
Tem domínio na criação de alertas e métricas essenciais para os sistemas através de ferramentas ou serviços como Splunk, Prometheus, Grafana, Cloud Watch, etc.
Suas soluções e aprendizados são compartilhados com o time, a comunidade.
Executa e/ou suporta Chaos Engineering através de ferramentas de testes de desempenho, falha, etc. (Ex: Jmeter, P4All)

Habilidades Preferenciais

Tem domínio técnico da linguagem de desenvolvimento de soluções, assim como também para Cloud, Segurança e Performance.
Constrói automações ou recursos de fácil reuso e manutenção.
Identifica causas-raízes, aplica sessões de postmortem diminuindo a complexidade ao lidar com futuros incidentes.
Dissemina sua solução técnica, preocupado em torná-la referência principalmente para outros SREs.
Implanta diretrizes de confiabilidade em suas soluções e dá apoio técnico para que o time faça o mesmo.
Implanta métricas, alertas, para deixar as soluções aderente ao negócio e a experiência do cliente.
Executa automação de deploy contínuo para evitar tarefas repetitivas.
Experiência em Cloud Platform: AWS, Azure ou GCP
Orquestração em Kubernetes
Experiência em CI/CD
Ferramentas de Observability
Experiência em ferramentas de deploy contínuo (Terraform, Puppet)
Mindset de "Automatize tudo que for possível"
Experiência em infraestrutura de Código: Terraform & Cloudformation
Conhecimento de alguma linguagem de programação: Java, Kotlin, Go, Python, Ruby ou Rust.
Vivência em lidar com ambientes críticos ou alta escalabilidade.
Experiência na prestação de serviços para empresas do segmento de TELECOM

Desculpe, este trabalho não está disponível em sua região

Deployment Reliability Engineer

HCLTech

Publicado há 2 dias atrás

Toque novamente para fechar

Descrição Do Trabalho

Your role and responsabilities:

Manage continuous delivery and configuration of SAP Ariba Cloud products using modern deployment tools.
Respond quickly to deployment requests and provide technical support for the SAP Ariba suite.
Collaborate with engineering subject matter experts to ensure seamless operations.
Handle user tickets and change requests within defined SLAs.
Automate manual tasks to improve scalability and efficiency.
Lead complex deployment projects including new site setups and disaster recovery planning.
Manage certificate renewals and troubleshoot related issues.
Document SOPs and apply ITIL best practices.

Requiriments and Qualifications:

Experience in a Unix/Linux environment.
Familiar with SAP Ariba Cloud products
Proven experience in 24x7 enterprise environments.
Hands-on expertise with cloud provisioning (preferably GCP and AWS).
Proficiency in Terraform and CI/CD tools like Jenkins, Artifactory, Docker, Vault.
Development experience in Python, Go, or Groovy.
Strong knowledge of system applications (Apache, DNS, SSH, TCP/IP, NFS).
Deep understanding of OS internals and file system structures.
Experience with certificate management and scripting (Perl, Python, Shell).
Basic knowledge of HANA database administration.
Excellent communication, analytical, and multitasking skills.
Bachelor’s degree in MIS, CS, or equivalent experience.
Advanced English

Please submit resumé in English

Desculpe, este trabalho não está disponível em sua região

Site Reliability Engineer

HCLTech

Publicado há 2 dias atrás

Toque novamente para fechar

Descrição Do Trabalho

Your role and responsabilities:

Handling major incidents via CIRS (Critical Issue Response System) and providing frequent updates until resolution.
Performing deep-dive application troubleshooting and identifying preventive actions.
Managing CIRS-related requests including deployments, feature toggles, and data fixes.
Following up on major production incidents and coordinating with cross-functional teams.
Enhancing monitoring capabilities using tools like Dynatrace, Kibana, and Splunk .
Writing and improving monitoring scripts and alerts based on incident learnings.
Handling customer escalations and coordinating with Support & Engineering teams.
Supporting planned activities and responding to ad-hoc requests from CES teams.

Requirements and Qualifications:

Deep experience in DevOps and Production Support .
Experience in automation and CI/CD practices.
Familiarity with cloud platforms (GCP, AWS, or Azure preferred).
Hands-on experience with monitoring tools such as Dynatrace, Kibana, Splunk .
Strong troubleshooting skills and ability to deep dive into application issues.
Excellent communication and coordination skills across teams.

Please submit resumé in English.

Desculpe, este trabalho não está disponível em sua região

Site Reliability Engineer

Gauge

Publicado há 15 dias atrás

Toque novamente para fechar

Descrição Do Trabalho

Somos uma empresa do Grupo Stefanini. Especializados em marketing digital, utilizamos uma abordagem integrada que combina tecnologia, inteligência de dados, design e profundo conhecimento do comportamento do consumidor. Nosso foco está em potencializar os resultados de nossos parceiros, oferecendo soluções que vão desde consultoria estratégica até a execução e acompanhamento dos projetos. Com um time dedicado e altamente qualificado, a Gauge se destaca por sua capacidade de entender as necessidades específicas de cada cliente e entregar resultados de alta performance.

Com forte presença na América Latina e em expansão nos Estados Unidos, estamos sempre na vanguarda, aplicando as últimas tendências de mercado e mantendo um olhar atento à inovação contínua.

Desculpe, este trabalho não está disponível em sua região

Seja o primeiro a saber

Sobre o mais recente Reliability engineer Empregos em Brasil !

Definir alerta por e-mail:

Digite seu e-mail

Cargo

Localização

Site Reliability Engineer

Buenos Aires, Pernambuco DEUNA

Hoje

Toque novamente para fechar

Descrição Do Trabalho

Overview

As a Mid SRE at DEUNA, you’ll ensure the reliability, scalability, and performance of our AWS-based platform by integrating observability, automation, and SRE best practices across the software lifecycle. You will work closely with development teams to improve uptime, provide observability tooling, and ensure we scale efficiently and securely.

Key Responsibilities

Design, define, and maintain observability and monitoring for our AWS infrastructure
Define and track SLIs, SLOs, and SLAs for critical systems
Improve system uptime, latency, and fault tolerance across the platform
Provide internal libraries and toolsets to developers for diagnostics and debugging
Manage scaling, performance, and resilience efforts related to system reliability
Collaborate with technical teams on capacity planning, load testing, and scaling policies
Improve production operations by defining and evolving deployment strategies and conducting disaster recovery (DR) testing

Technical Skills

Expertise with Prometheus, Grafana, OpenTelemetry, AWS CloudWatch, or other observability tools
Experience designing dashboards, alerts, and log aggregation pipelines
Deep understanding of AWS services: ECS, Lambda, RDS, CodePipeline
Strong proficiency in Go programming language
Skilled at defining SLIs, SLOs, error budgets, and improving Mean Time to Recovery (MTTR)
Experience conducting failure drills (e.g., Chaos Monkey, Gremlin) to ensure system resilience

Soft Skills

Excellent communication and collaboration skills
Adaptability to thrive in dynamic, fast-paced environments
Strong time management and task prioritization
Proficiency in English

What you will find when you join DEUNA

A multicultural team distributed throughout LATAM
Dynamism, agility and constant innovation
Being part of a high-impact solution for an entire region
The best tools and technology to operate
Being part of the startup culture
We are in full expansion!

Benefits

Vacations and additional PTO
Remote work from anywhere
Economic support for health insurance, internet and cell phone line
We all own DEUNA, we offer stock options
Learning and development platform
Multidisciplinary, diverse and dynamic team
Growth and career path
Be part of a dynamic team that's creating the next generation payments platform
Join us at DEUNA

Details

Seniority level: Not Applicable
Employment type: Full-time
Job function: Engineering and Information Technology
Industries: Software Development

#J-18808-Ljbffr

Desculpe, este trabalho não está disponível em sua região

Data Reliability Engineer

São Paulo, São Paulo TELUS Digital Brazil

Hoje

Toque novamente para fechar

Descrição Do Trabalho

Join to apply for the Data Reliability Engineer role at TELUS Digital Brazil

1 week ago Be among the first 25 applicants

Get AI-powered advice on this job and more exclusive features.

Overview

Welcome to TELUS Digital , where innovation meets impact. As an award-winning digital product consultancy, we're shaping the future of digital experiences through cutting-edge technology, agile thinking, and a culture that puts people first. We are the global digital section of TELUS, one of Canada’s largest telecommunications providers. Our global teams deliver transformative digital solutions and customer experiences for industry leaders in consumer electronics, finance, telecommunications, and utilities.

Location and flexibility

This role can be fully remote for candidates based in the states of São Paulo and Rio Grande do Sul as well as in the cities of Rio de Janeiro, and Belo Horizonte , due to team distribution and occasional in-person opportunities. If you are based in São Paulo or Porto Alegre, you are welcome to work from one of our offices on a flexible schedule.

Qualifications

5+ years of hands-on experience in supporting data engineering teams, strongly emphasizing data pipeline enhancement and optimization, and data integration.
Proficient in cloud computing, preferably Google Cloud Platform (GCP), but AWS and Azure are also valid.
Experience with cloud data-related services such as BigQuery, Dataflow, Cloud Composer, Dataproc, Cloud Storage, Pub/Sub, or the correlated services from other providers.
Solid proficiency with Python in terms of data processing.
Knowledge of SQL and experience with relational databases.
Proven experience optimizing data pipelines toward efficiency, reducing operational costs, and reducing the number of issues/failures.
Solid knowledge of monitoring, troubleshooting, and resolving data pipeline issues.
Familiarity with version control systems like Git.
Strong English communication and documentation skills.

Responsibilities

Design and implement scalable data pipeline architectures in collaboration with Data Engineers.
Continuously optimize data pipeline efficiency to reduce operational costs and minimize issues and failures.
Monitor performance and reliability of data pipelines, enhancing reliability through data quality, analysis, and testing.
Build and manage automated alerting systems for data pipeline issues.
Automate repetitive tasks in data processing and management.
Develop and manage disaster recovery and backup plans.
In collaboration with other Data Engineering teams, conduct capacity planning for data storage and processing needs.
Develop and maintain comprehensive documentation for data pipeline systems and processes, and provide knowledge transfer to data-related teams.
Monitor, troubleshoot and resolve production issues in data processing workflows.
Maintain infrastructure reliability for data pipelines, enterprise datahub, HPBI, and MDM systems.
Conduct post-incident reviews and implement improvements for data pipelines.

Why TELUS Digital?

At TELUS Digital, you’ll work with world-class brands like FOX, HBO, PepsiCo, and Domino's, building transformative digital products that impact millions. Our global reach allows you to collaborate with diverse, international teams, solving complex problems and delivering tech-driven solutions that matter.

We thrive on engineering excellence, using the latest technologies in cloud computing, AI, machine learning, DevOps, microservices architecture, and data engineering. Our teams embrace Agile methodologies, CI/CD pipelines, and a DevOps-first mindset to deliver solutions at scale.

In addition to being part of an international and innovative consultancy company, you will have:

A Global Innovation Hub: Be part of an international consultancy at the forefront of technology
Work-Life Harmony: Enjoy flexible hours and autonomy to balance your professional and personal life
Cutting-Edge Tech Playground: Dive into the latest technologies and shape the future of digital solutions
Prestigious Partnerships: Collaborate with world-renowned brands, making a real impact in the market
Growth-Centric Environment: Thrive in our collaborative ecosystem with a clear career development path
Global Exposure: Embrace optional international travel opportunities to broaden your horizons

Equality

At TELUS Digital, we are proud to be an equal opportunity employer and are committed to creating a diverse and inclusive workplace. We are committed to building an inclusive team that represents a variety of backgrounds, perspectives, beliefs, and experiences. Therefore we provide equal employment opportunities to all employees and applicants regardless of race, color, religion, gender identity, sexual orientation, national origin, age, or disability.

We will only use the information you provide to process your application and to produce tracking statistics. Since we do not request personal data deemed sensitive, we ask you to abstain from sharing that information with us.

For more information on how we use your information, see our Privacy Policy.

Seniority level: Mid-Senior level

Employment type: Full-time

Job function: Engineering and Information Technology

Industries: Software Development

#J-18808-Ljbffr

Desculpe, este trabalho não está disponível em sua região

Site Reliability Engineer

São Paulo, São Paulo Willis Towers Watson

Hoje

Toque novamente para fechar

Descrição Do Trabalho

Description

Summary :

We’re looking for an experienced Platform/Infrastructure Engineer with a strong Microsoft Azure background and deep knowledge of Kubernetes. You'll play a key role in designing, deploying, and maintaining infrastructure and services that power our products. This role requires hands-on experience with automation, modern IaC practices, CI/CD, and maintaining production-grade environments.

The Role:

Operate, monitor, and improve cloud infrastructure for high-availability services in Azure
Deploy, configure and manage Kubernetes workloads at scale, including the use of Helm, ArgoCD, Flux, or similar GitOps tools
Build and maintain CI/CD pipelines using Azure DevOps or similar tooling
Write and maintain Infrastructure as Code using Terraform or OpenTofu
Develop scripts and automation to support infrastructure and deployment workflows - PowerShell is preferred
Collaborate with engineering teams to support platform reliability and enable delivery
Maintain visibility and awareness through monitoring and logging tools such as Datadog, Azure Monitor, App Insights etc.
Support incident resolution and participate in an on-call rota to help maintain service uptime

Qualifications

The Requirements:

Essential Experience:

Proven experience in a Platform, Infrastructure, or DevOps engineering role
Hands-on experience operating 24x7 services in a public cloud, ideally Azure
Strong experience managing infrastructure using Terraform or OpenTofu
Experience managing and scaling Kubernetes clusters in production environments
Proficient with CI/CD tooling, preferably Azure DevOps (YAML pipelines)
Strong scripting skills using PowerShell
Experience with monitoring and logging solutions such as Azure Monitor, App Insights, or similar
Clear communicator with the ability to collaborate across cross-functional teams

Nice to Have:

Azure certifications (e.g. Azure Administrator, Azure DevOps Engineer)
Experience with GitOps and tools such as ArgoCD or Flux
Familiarity with Configuration as Code tools like Ansible or Puppet
Exposure to large-scale distributed systems or high-volume web APIs
Awareness of incident response processes and platform reliability best practices

Equal Opportunity Employer

At WTW, we believe difference makes us stronger. We want our workforce to reflect the different and varied markets we operate in and to build a culture of inclusivity that makes colleagues feel welcome, valued and empowered to bring their whole selves to work every day. We are an equal opportunity employer committed to fostering an inclusive work environment throughout our organisation. We embrace all types of diversity.

At WTW, we trust you to know your work and the people, tools and environment you need to be successful. The majority of our colleagues work in a ”hybrid” style, with a mix of remote, in-person and in-office interactions dependent on the needs of the team, role and clients. Our flexibility is rooted in trust and “hybrid” is not a one-size-fits-all solution.

#J-18808-Ljbffr

Desculpe, este trabalho não está disponível em sua região

Indústria

Ver tudo Reliability engineer Empregos

Menu

Sugestões de pesquisa

Pesquisas Recentes

Pesquisas populares

Sugestões de localização

Locais populares

Locais próximos

Outros empregos perto de mim

Indústria

2.448 Empregos para Reliability Engineer - Brasil

Reliability Engineer

Descrição Do Trabalho

Reliability Engineer

Descrição Do Trabalho

Reliability Engineer

Descrição Do Trabalho

Site Reliability Engineer

Descrição Do Trabalho

Deployment Reliability Engineer

Descrição Do Trabalho

Site Reliability Engineer

Descrição Do Trabalho

Site Reliability Engineer

Descrição Do Trabalho

Seja o primeiro a saber

Site Reliability Engineer

Descrição Do Trabalho

Data Reliability Engineer

Descrição Do Trabalho

Site Reliability Engineer

Descrição Do Trabalho

Locais próximos

Outros empregos perto de mim

Indústria