Senior Site Reliability Engineer

Clearwater Analytics

Noida, India
Fully remote
Observability platform end-to-end
Kubernetes (eks) platform owner
Infrastructure-as-code with terraform
Own the observability platform end-to-end and establish SLO/SLI frameworks

Job Summary

  • Own the observability platform end-to-end and establish SLO/SLI frameworks.
  • Lead major incident response as an incident commander and drive root-cause analysis.
  • Lead the evolution of CWAN's cloud infrastructure on AWS, establishing scalability, resilience, and security standards.

Matching Summary

Own the observability platform end-to-end and establish SLO/SLI frameworks.

Skills & Requirements

Must-have

  • Observability platform end-to-end
  • Kubernetes (EKS) platform owner
  • Infrastructure-as-Code with Terraform
  • CI/CD and automated deployment pipelines
  • Site Reliability Engineering experience

Nice-to-have

  • Financial services experience
  • Service mesh and eBPF
  • Multi-region active-active architectures
  • Staff or principal engineer experience

Key Requirements

  • 7+ years of SRE/Platform Engineering experience
  • Proven incident response leadership
  • Strong observability stack experience
  • Expertise with Kubernetes at scale
  • Advanced AWS proficiency
  • Expert Infrastructure-as-Code with Terraform
  • Proficiency in one programming language

Work Rights

Not specified

Tailored Resume

Cover Letter