Cloud Reliability Engineer

Infios

Remote
Kubernetes cluster management
Infrastructure-as-code (iac)
Automated deployment pipelines
We develop future technologies to relentlessly make supply chains better

Job Summary

  • We develop future technologies to relentlessly make supply chains better.
  • Operate, maintain, and improve cloud infrastructure in AWS, Azure, or GCP environments, managing and optimizing Kubernetes clusters.
  • At Infios, we're not just looking for employees; we're looking for partners in innovation, growth, and purpose.

Matching Summary

We develop future technologies to relentlessly make supply chains better.

Skills & Requirements

Must-have

  • Kubernetes cluster management
  • Infrastructure-as-code (IaC)
  • Automated deployment pipelines
  • SRE principles (SLIs, SLOs)
  • Monitoring and alerting dashboards
  • Incident response and troubleshooting
  • Cloud platforms (AWS, Azure, GCP)

Nice-to-have

  • Resilience assessment and drills
  • Chaos engineering practices
  • Problem-solving and automation-first mentality
  • Clear communication of technical issues
  • Passion for operational excellence

Key Requirements

  • 5+ years of experience in Cloud Engineering, DevOps, or Site Reliability roles
  • Bachelor’s degree in computer science, Engineering, or related field (or equivalent experience)
  • Hands-on experience with cloud platforms (OCI, AWS, Azure, or GCP)
  • Strong knowledge of Kubernetes deployment, management, and troubleshooting
  • Solid understanding of observability and monitoring (e.g., Dynatrace, DataDog) and incident management platforms
  • Proficiency in scripting and automation (e.g., Python, Bash, Terraform, Ansible)
  • Strong troubleshooting and analytical skills
  • Experience with incident response, RCA, and postmortem processes
  • Understanding of SRE principles, SLAs/SLOs/SLIs

Work Rights

Not specified

Tailored Resume

Cover Letter