Site Reliability Engineer (edge Services), Infrastructure Services

Apple

United States Of America, United States
Not specified; not specified; not specified
Not specified (assumed hybrid based on industry trends).
Linux internals and deep networking expertise
Http/2 http/3 quic https/tls protocol debugging
Python or go automation experience
Apple is seeking a proactive Site Reliability Engineer to enhance their production ecosystems by implementing advanced observability and automation strategies. The role involves ensuring the resilience and scalability of services while collaborating closely with development teams to integrate reliability measures into the CI/CD pipeline

Job Summary

  • You will champion the evolution of production ecosystems by building a sophisticated data-driven reliability framework beyond simple uptime metrics.
  • Your mission involves designing next-generation observability and alerting strategies that prioritize high-cardinality data and meaningful signals over noise.
  • You will focus on building self-healing systems and reducing toil through aggressive automation while partnering with development teams to bake reliability into the CI/CD pipeline.

Matching Summary

Match Score: 85

Apple is seeking a proactive Site Reliability Engineer to enhance their production ecosystems by implementing advanced observability and automation strategies. The role involves ensuring the resilience and scalability of services while collaborating closely with development teams to integrate reliability measures into the CI/CD pipeline.

Salary

Not specified; Not specified; Not specified

Skills & Requirements

Must-have

  • Linux internals and deep networking expertise
  • HTTP/2 HTTP/3 QUIC HTTPS/TLS protocol debugging
  • Python or Go automation experience
  • Prometheus Grafana ClickHouse monitoring configuration
  • SLI SLO Error Budget Release Management knowledge

Nice-to-have

  • AWS GCP Azure cloud environment management
  • Kubernetes container orchestration and security
  • Leading blameless post-mortems
  • Consulting with product teams on service design
  • Applying Generative AI tools in SRE workflows

Key Requirements

  • Understanding of Linux internals and deep networking expertise
  • Proven ability to automate tasks using Python or Go
  • Grasp of Data Structures and Algorithms (DSA)
  • Practical knowledge of SLIs SLOs Error Budgets and Incident Management

Work Rights

Not specified

Tailored Resume

Cover Letter