The Principal Site Reliability Engineer will safeguard operational excellence by defining SLIs, SLOs, and Error Budgets for mission-critical platforms
Job Summary
The Principal Site Reliability Engineer will safeguard operational excellence by defining SLIs, SLOs, and Error Budgets for mission-critical platforms.
This role requires driving production incident response, leading root cause analysis, and collaborating with development teams to implement stability improvements.
Candidates must possess strong expertise in Azure resources, DevOps pipelines, and monitoring tools like Grafana and Dynatrace to reduce toil and ensure scalability.
Matching Summary
The Principal Site Reliability Engineer will safeguard operational excellence by defining SLIs, SLOs, and Error Budgets for mission-critical platforms.
Skills & Requirements
Must-have
Azure VMs Storage Network Functions
Azure DevOps GitOps pipelines
Grafana Dynatrace Splunk monitoring
C# programming proficiency
Production incident response leadership
Nice-to-have
Terraform Docker Kubernetes experience
Navitaire product knowledge
Infrastructure automation with PowerShell
Multi-cultural team collaboration
Shift schedule flexibility
Key Requirements
Proficient in C# programming
Hands-on Azure infrastructure experience
Experience with monitoring tools (Grafana, Dynatrace, Splunk)