Site Reliability Engineer - Multicloud Platform

Workday

Fully remote
Kubernetes experience in public cloud
Golang programming proficiency
Linux operating system expertise
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load

Job Summary

  • The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
  • Engineers will develop and launch effective SLIs to ensure SLOs are achieved through building an extendable Observability architecture and runbook automation.
  • Workday offers a flexible work approach requiring at least half of the time each quarter to be spent in the office or with customers.

Matching Summary

The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.

Skills & Requirements

Must-have

  • Kubernetes experience in public cloud
  • GoLang programming proficiency
  • Linux operating system expertise
  • Distributed systems troubleshooting
  • CI/CD and code management
  • AWS GCP or Azure cloud platforms

Nice-to-have

  • Istio service mesh knowledge
  • OPA policy management
  • Prometheus and Grafana monitoring
  • Runbook automation development
  • Scrum team collaboration
  • Cloud Native conference experience

Key Requirements

  • BS in Computer Science or equivalent experience
  • 3+ years SRE experience in distributed systems
  • 1+ years handling distributed systems in public cloud for junior roles
  • Proficiency in GoLang, Python, or Ruby

Work Rights

Not specified

Tailored Resume

Cover Letter