The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load
Job Summary
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
Engineers will own the reliability for the complete stack and tools that deliver Workday products across public clouds using Cloud Native technologies.
The role requires developing effective SLIs, building an extendable Observability architecture, and establishing runbook automation to improve customer happiness.
Matching Summary
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
Skills & Requirements
Must-have
Kubernetes experience required
GoLang programming proficiency
Public cloud AWS GCP Azure
Linux operating system expertise
Distributed systems troubleshooting
CI/CD and code management
Nice-to-have
Istio service mesh knowledge
OPA policy management
Prometheus and Grafana monitoring
KubeCon conference experience
Scrum agile methodology
Follow-the-sun on-call rotation
Key Requirements
BS in Computer Science or equivalent experience
3+ years SRE experience in distributed systems
1+ years handling distributed systems in public cloud