Site Reliability Engineer - Multicloud Platform

Workday

Fully remote
3+ years sre experience in distributed systems
Strong kubernetes experience in public cloud
Proficiency in golang, python, or ruby programming
The primary function of the team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load

Job Summary

  • The primary function of the team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.
  • Engineers will own the reliability for the complete stack delivering Workday products across public clouds using a foundation of Kubernetes.
  • The role involves developing effective SLIs, building an extendable Observability architecture, and establishing new processes to improve customer happiness.

Matching Summary

The primary function of the team is to ensure the reliability and availability of the platform to meet desired SLAs while reducing operational load.

Skills & Requirements

Must-have

  • 3+ years SRE experience in distributed systems
  • Strong Kubernetes experience in public cloud
  • Proficiency in GoLang, Python, or Ruby programming
  • Experience with AWS, GCP, or Azure cloud platforms
  • Linux operating system administration skills

Nice-to-have

  • Passion for automation and reducing operational toil
  • Experience collaborating with global remote teams
  • Excellent documentation and runbook development skills
  • Background in Cloud Native conferences like KubeCon

Key Requirements

  • BS in Computer Science or equivalent experience
  • 1-3+ years handling distributed systems in public cloud
  • 1-3+ years SRE experience in distributed systems environment

Work Rights

Not specified

Tailored Resume

Cover Letter