Manager Site Reliability Engineer

Workday

Hybrid (50% in-office, flexible schedule)
3+ years leading sre or database engineering teams
8+ years software or systems engineering experience
Database internals tuning and query optimization
Workday seeks a Manager of Site Reliability Engineering (SRE) to lead its Database Reliability Engineering team, focusing on building scalable, resilient data infrastructures. The ideal candidate will possess extensive experience in database engineering, software engineering, and team leadership while contributing to a collaborative and innovative work environment

Job Summary

  • The role involves leading a visionary team to replace manual database interventions with automated, self-healing platforms.
  • Candidates must have extensive experience managing thousands of production databases across multiple data centers and public clouds.
  • Workday offers a flexible work model requiring at least 50% time in-office or with customers each quarter.

Matching Summary

Match Score: 85

Workday seeks a Manager of Site Reliability Engineering (SRE) to lead its Database Reliability Engineering team, focusing on building scalable, resilient data infrastructures. The ideal candidate will possess extensive experience in database engineering, software engineering, and team leadership while contributing to a collaborative and innovative work environment.

Skills & Requirements

Must-have

  • 3+ years leading SRE or Database Engineering teams
  • 8+ years software or systems engineering experience
  • Database internals tuning and query optimization
  • Managing databases within Kubernetes using Operators
  • Implementing automated failover mechanisms
  • Experience with AWS RDS/Aurora and GCP Cloud SQL
  • Reducing Mean Time to Resolution for critical outages

Nice-to-have

  • Passion for Open-Source and Cloud Native solutions
  • Fostering a culture of psychological safety
  • Deep-dive troubleshooting in Linux internals
  • Mentoring senior engineers
  • Sun-drenched optimism and courage
  • Experience with Prometheus, Grafana, Datadog

Key Requirements

  • Bachelor's degree in Computer Science or related field
  • 4+ years as an SRE/DBRE designing resilient data infrastructure
  • 5+ years spearheading high-stakes response for critical data outages

Work Rights

Not specified

Tailored Resume

Cover Letter