Adobe Media and Data Science Research (MDSR) Laboratory
12+ years in site reliability or production engineering
Expert proficiency in python, go, java, or bash
Deep understanding of kubernetes and microservices
The role involves defining the long-term reliability and scalability strategy for the Adobe Pass platform while ensuring zero single points of failure
Job Summary
The role involves defining the long-term reliability and scalability strategy for the Adobe Pass platform while ensuring zero single points of failure.
Candidates will champion advanced automation frameworks to enable zero-touch operations and introduce AI/ML-based predictive monitoring systems.
This position requires leading organization-wide reliability initiatives such as chaos engineering and driving measurable improvements in MTTR and MTBF.
Matching Summary
The role involves defining the long-term reliability and scalability strategy for the Adobe Pass platform while ensuring zero single points of failure.
Skills & Requirements
Must-have
12+ years in site reliability or production engineering
Expert proficiency in Python, Go, Java, or Bash
Deep understanding of Kubernetes and microservices
Advanced experience with Terraform and CI/CD
Mastery in Prometheus, Grafana, Datadog, or OpenTelemetry
Nice-to-have
Experience with error budgets and chaos engineering
Prior work in high-traffic media streaming platforms
Familiarity with Kafka, Spark, or Hadoop ecosystems