This role operates across Azure, AWS, GCP, and on-prem environments, embedded in the broader enterprise resiliency and production reliability strategy
Job Summary
This role operates across Azure, AWS, GCP, and on-prem environments, embedded in the broader enterprise resiliency and production reliability strategy.
Core responsibilities include deep investigations, advanced observability, auto-healing tooling, SLI/SLO stewardship, and business-aligned reliability reporting.
We offer a competitive salary range, annual bonus target, flexible work arrangements, hybrid work model, and multiple benefits supporting physical and mental wellbeing.
Matching Summary
This role operates across Azure, AWS, GCP, and on-prem environments, embedded in the broader enterprise resiliency and production reliability strategy.
Salary
Base: 109,900 - 134,300; Bonus/Equity: 15% annual bonus target, up to double target, ESPP with 50% match; Benefits: Multiple benefits offered, Wellness account, Share plan, Defined benefit pension plan
Skills & Requirements
Must-have
Advanced observability tooling
Auto-healing and reliability tooling
SLI/SLO stewardship
Cloud and platform reliability
AI for reliability
Chaos engineering and DR exercises
Nice-to-have
Coaching and leadership skills
Automation-first culture
Business-aligned reliability reporting
Resilience culture promotion
Key Requirements
8+ years of experience in SRE/Platform/Infrastructure/Software Engineering
Proficiency in Observability (OpenTelemetry, Dynatrace, Elastic)
Proficiency in Reliability engineering (SLIs/SLOs/SLAs)
Proficiency in CI/CD and deployment patterns
Proficiency in Kubernetes and service meshes
Proficiency in Data and event systems
Proficiency in Networking and traffic management
Solid software engineering skills (Go, Python, or TypeScript)
Experience with IaC (Terraform), GitOps (Argo CD/Flux)