Sre (site Reliability Engineering)

Jobgether

Brazil
On-site
Aws cloud infrastructure management
Kubernetes container orchestration
Mlops environment experience
Jobgether is seeking a Site Reliability Engineer (SRE) in Brazil to work in a dynamic and collaborative MLOps environment. The role focuses on enhancing the reliability and performance of cloud-native systems supporting machine learning operations, with an emphasis on automation and observability

Job Summary

  • This role sits within a high-impact MLOps environment focused on ensuring the reliability and scalability of infrastructure supporting machine learning models.
  • You will contribute directly to the stability of critical platforms running on AWS and Kubernetes with a strong emphasis on automation and observability.
  • The position offers an opportunity to have a direct impact on large-scale production systems while growing expertise in SRE, DevOps, and MLOps practices.

Matching Summary

Match Score: 85

Jobgether is seeking a Site Reliability Engineer (SRE) in Brazil to work in a dynamic and collaborative MLOps environment. The role focuses on enhancing the reliability and performance of cloud-native systems supporting machine learning operations, with an emphasis on automation and observability.

Skills & Requirements

Must-have

  • AWS cloud infrastructure management
  • Kubernetes container orchestration
  • MLOps environment experience
  • Automation and observability tools
  • Distributed systems resilience

Nice-to-have

  • Collaborative engineering-driven team
  • Continuous learning culture
  • Proactive problem-solving mindset
  • Fast-paced evolving context
  • Direct impact on large-scale systems

Key Requirements

  • Experience with AWS and Kubernetes
  • Background in MLOps or Data Pipelines
  • Strong understanding of distributed systems

Work Rights

Not specified

Tailored Resume

Cover Letter