This senior hands-on role drives enterprise reliability outcomes by designing intelligent, automated, and AI-assisted solutions for resilient systems
Job Summary
This senior hands-on role drives enterprise reliability outcomes by designing intelligent, automated, and AI-assisted solutions for resilient systems.
The position focuses on building self-healing workflows that automatically detect, diagnose, and remediate failures using telemetry and AI models.
Candidates will collaborate across teams to embed reliability into the SDLC, reducing operational toil and improving developer productivity through automation.
Matching Summary
This senior hands-on role drives enterprise reliability outcomes by designing intelligent, automated, and AI-assisted solutions for resilient systems.
Skills & Requirements
Must-have
7+ years SRE or DevOps experience
Python Go Java automation skills
Self-healing system design expertise
AI/ML anomaly detection implementation
AWS Azure GCP cloud platforms
Kubernetes OpenShift microservices
Dynatrace Prometheus Grafana observability
Nice-to-have
Chaos engineering and resilience testing
Generative AI for runbook generation
ServiceNow CMDB service modeling
Enterprise SRE enablement platform building
Mentoring engineers in reliability culture
Mainframe integrated system support
Quantifiable MTTR reduction delivery
Key Requirements
Bachelor's degree in Computer Science or related field
7+ years of experience in SRE or production software engineering
Proven track record delivering enterprise-scale automation