Site Reliability Engineer - Multicloud Platform

Workday

Fully remote
Kubernetes experience required
Public cloud infrastructure (aws/gcp/azure)
Golang programming proficiency
The primary function of the team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load

Job Summary

  • The primary function of the team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
  • Engineers will own the reliability for the complete stack across public clouds using a foundation of Kubernetes designed from scratch for the cloud.
  • Workday offers a flexible work approach requiring at least half of the time each quarter to be spent in-office or with customers, combined with remote flexibility.

Matching Summary

The primary function of the team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.

Skills & Requirements

Must-have

  • Kubernetes experience required
  • Public cloud infrastructure (AWS/GCP/Azure)
  • GoLang programming proficiency
  • Linux operating system expertise
  • Distributed systems troubleshooting
  • CI/CD and code management
  • SRE operational toil reduction

Nice-to-have

  • Istio service mesh knowledge
  • OPA policy enforcement
  • Prometheus and Grafana monitoring
  • Scrum agile methodology experience
  • Cloud Native conference participation
  • Follow-the-sun on-call support
  • Runbook automation development

Key Requirements

  • BS in Computer Science or equivalent experience
  • 3+ years SRE experience in distributed systems
  • 1+ years handling distributed systems in public cloud
  • Proficiency in GoLang, Python, or Ruby
  • Experience with software development standard methodologies

Work Rights

Not specified

Tailored Resume

Cover Letter