Site Reliability Engineer - Multicloud Platform

Workday

Fully remote
Kubernetes experience required
Golang programming proficiency
Public cloud aws gcp azure
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load

Job Summary

  • The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
  • Engineers will own the reliability for the complete stack and tools that deliver Workday products across public clouds using Cloud Native technologies.
  • The role requires developing effective SLIs, building an extendable Observability architecture, and establishing runbook automation to improve customer happiness.

Matching Summary

The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.

Skills & Requirements

Must-have

  • Kubernetes experience required
  • GoLang programming proficiency
  • Public cloud AWS GCP Azure
  • Linux operating system expertise
  • Distributed systems troubleshooting
  • CI/CD and code management

Nice-to-have

  • Istio service mesh knowledge
  • OPA policy management
  • Prometheus and Grafana monitoring
  • KubeCon conference experience
  • Scrum agile methodology
  • Follow-the-sun on-call rotation

Key Requirements

  • BS in Computer Science or equivalent experience
  • 3+ years SRE experience in distributed systems
  • 1+ years handling distributed systems in public cloud
  • Proficiency in GoLang, Python, or Ruby

Work Rights

Not specified

Tailored Resume

Cover Letter