Site Reliability Engineer - Multicloud Platform

Workday

Fully remote
3+ years sre experience in distributed systems
Strong kubernetes experience in public cloud
Proficiency in golang, python, or ruby programming
The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load

Job Summary

  • The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
  • Engineers will own the reliability for the complete stack and tools that deliver Workday products across public clouds using cloud-native technologies like Kubernetes.
  • The role involves developing effective SLIs, building an extendable Observability architecture, and partnering with service teams to implement SRE standards.

Matching Summary

The primary function of the SRE team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.

Skills & Requirements

Must-have

  • 3+ years SRE experience in distributed systems
  • Strong Kubernetes experience in public cloud
  • Proficiency in GoLang, Python, or Ruby programming
  • Experience with AWS, GCP, or Azure environments
  • Linux operating system administration skills

Nice-to-have

  • Passion for automation and reducing operational toil
  • Experience collaborating with global remote teams
  • Excellent documentation and runbook development skills
  • Background presenting at Cloud Native conferences
  • Ability to work independently in fast-paced environments

Key Requirements

  • BS in Computer Science or equivalent experience
  • 1-3+ years handling distributed systems in public cloud
  • 1-3+ years of SRE experience in distributed systems

Work Rights

Not specified

Tailored Resume

Cover Letter