The Senior Site Reliability Engineer will lead reliability engineering initiatives across the Azure estate and Command Center operations to ensure uptime and rapid incident response
Job Summary
The Senior Site Reliability Engineer will lead reliability engineering initiatives across the Azure estate and Command Center operations to ensure uptime and rapid incident response.
This role focuses on implementing monitoring-as-code, optimizing alerting, and building self-healing automation to reduce toil and accelerate recovery times.
Candidates must partner with product engineering and platform teams to deliver measurable improvements in service reliability while adhering to strict governance and compliance standards.
Matching Summary
The Senior Site Reliability Engineer will lead reliability engineering initiatives across the Azure estate and Command Center operations to ensure uptime and rapid incident response.
Skills & Requirements
Must-have
7+ years SRE/DevOps experience
Azure production at scale expertise
Terraform or Bicep Infrastructure as Code
PowerShell and Python automation skills
AKS App Services Functions networking knowledge
Azure Monitor Log Analytics Application Insights
ITSM ServiceNow Jira integration experience
Nice-to-have
Chaos testing and resilience pattern experience
Major Incident Command leadership capability
Strong interpersonal communication skills
Experience with ITRS Geneos observability tool
Ability to work varied shifts including nights
Key Requirements
Bachelor's degree in Computer Science or IT field
Proof of authorization to work in the United States
Must be available to work varied shifts including nights weekends holidays
7+ years of experience in SRE DevOps Platform roles
4+ years focused on Azure in production at scale
Work Rights
Proof of authorization to work in the United States required