Site Reliability Engineer

ADOBE

Design highly available globally distributed systems
Expert proficiency in python go java or bash
Deep understanding of kubernetes microservices and service mesh
This role involves defining the long-term reliability and scalability strategy for Illustrator Enterprise Services to support globally distributed workflows

Job Summary

  • This role involves defining the long-term reliability and scalability strategy for Illustrator Enterprise Services to support globally distributed workflows.
  • The successful candidate will champion advanced automation frameworks and introduce AI/ML-based predictive monitoring to anticipate failures before they impact users.
  • Adobe offers a culture where employees are empowered to make an impact through innovative platforms powered by AI and human ingenuity.

Matching Summary

This role involves defining the long-term reliability and scalability strategy for Illustrator Enterprise Services to support globally distributed workflows.

Skills & Requirements

Must-have

  • Design highly available globally distributed systems
  • Expert proficiency in Python Go Java or Bash
  • Deep understanding of Kubernetes microservices and service mesh
  • Advanced experience with Terraform CloudFormation CI/CD
  • Mastery in Prometheus Grafana Datadog OpenTelemetry observability

Nice-to-have

  • Experience with chaos engineering error budgets SLO adoption
  • Prior work in high-traffic media streaming or advertising platforms
  • Familiarity with Kafka Spark Hadoop big data ecosystems
  • Hands-on security compliance governance SOC2 GDPR ISO27001
  • Published contributions or conference talks on reliability topics

Key Requirements

  • Bachelor's or Master's degree in Computer Science or Engineering
  • 4+ years of experience in site reliability or production engineering
  • Proven track record managing cloud-native environments AWS Azure GCP

Work Rights

Not specified

Tailored Resume

Cover Letter