The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load
Job Summary
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
Engineers will develop and launch effective SLIs to ensure SLOs are achieved through building an extendable Observability architecture and runbook automation.
Workday offers a flexible work approach requiring at least half of the time each quarter to be spent in the office or with customers.
Matching Summary
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
Skills & Requirements
Must-have
Kubernetes experience in public cloud
GoLang programming proficiency
Linux operating system expertise
Distributed systems troubleshooting
CI/CD and code management
AWS GCP or Azure cloud platforms
Nice-to-have
Istio service mesh knowledge
OPA policy management
Prometheus and Grafana monitoring
Runbook automation development
Scrum team collaboration
Cloud Native conference experience
Key Requirements
BS in Computer Science or equivalent experience
3+ years SRE experience in distributed systems
1+ years handling distributed systems in public cloud for junior roles