This senior hands-on role drives enterprise reliability outcomes by designing intelligent automated and AI-assisted solutions for legacy and modern cloud-native systems
Job Summary
This senior hands-on role drives enterprise reliability outcomes by designing intelligent automated and AI-assisted solutions for legacy and modern cloud-native systems.
The position focuses on building self-healing workflows that automatically detect diagnose and remediate failures while reducing operational toil through automation.
Candidates will collaborate with application teams and leadership to embed reliability into the SDLC and achieve measurable improvements in system availability and incident response times.
Matching Summary
This senior hands-on role drives enterprise reliability outcomes by designing intelligent automated and AI-assisted solutions for legacy and modern cloud-native systems.
Skills & Requirements
Must-have
7+ years SRE or DevOps experience
Python Go Java automation skills
Self-healing system design expertise
AI/ML anomaly detection implementation
Dynatrace Prometheus Grafana observability
AWS Azure GCP cloud platforms
Kubernetes OpenShift microservices
Nice-to-have
Enterprise SRE enablement platform experience
Chaos engineering production testing
Generative AI operational use cases
ServiceNow CMDB service modeling
Mentoring engineers and culture shaping
Mainframe integrated system support
Key Requirements
Bachelor's degree in Computer Science or related field
7+ years of SRE DevOps or production software engineering experience
Demonstrated success delivering enterprise-scale automation and reliability improvements