Site Reliability Engineer 5

Adobe Media and Data Science Research (MDSR) Laboratory

12+ years sre or production engineering experience
Expert proficiency in python, go, java, or bash
Deep understanding of kubernetes and microservices
This role involves defining the long-term reliability and scalability strategy for the Adobe Pass platform while ensuring zero single points of failure

Job Summary

  • This role involves defining the long-term reliability and scalability strategy for the Adobe Pass platform while ensuring zero single points of failure.
  • The successful candidate will champion advanced automation frameworks to enable zero-touch operations and introduce AI/ML-based predictive monitoring.
  • Candidates are expected to serve as a technical authority during high-impact incidents and lead blameless postmortems to drive continuous improvement.

Matching Summary

This role involves defining the long-term reliability and scalability strategy for the Adobe Pass platform while ensuring zero single points of failure.

Skills & Requirements

Must-have

  • 12+ years SRE or production engineering experience
  • Expert proficiency in Python, Go, Java, or Bash
  • Deep understanding of Kubernetes and microservices
  • Advanced Infrastructure as Code with Terraform
  • Mastery of observability stacks like Prometheus and Grafana

Nice-to-have

  • Experience with chaos engineering and error budgets
  • Background in high-traffic media streaming systems
  • Familiarity with big data ecosystems like Kafka and Spark
  • Hands-on security compliance experience SOC2 GDPR
  • Cloud or Kubernetes professional certifications

Key Requirements

  • Bachelor's or Master's degree in Computer Science or Engineering
  • 12+ years of experience in site reliability or distributed systems
  • Proven track record managing globally distributed cloud-native systems

Work Rights

Not specified

Tailored Resume

Cover Letter