The role is responsible for ensuring the stability, resilience, and reliability of critical IT services through advanced monitoring and incident management
Job Summary
The role is responsible for ensuring the stability, resilience, and reliability of critical IT services through advanced monitoring and incident management.
Candidates will act as command centre leads during critical outages to coordinate across technical and business teams for rapid recovery.
Success in this position requires driving continuous improvement by automating operational tasks and implementing AIOps tools.
Matching Summary
The role is responsible for ensuring the stability, resilience, and reliability of critical IT services through advanced monitoring and incident management.
Skills & Requirements
Must-have
8+ years of experience
Major Incident P1/P2 management
Root cause analysis and post-incident reviews
ITIL best practices implementation
SLO/SLI/SLA development and tracking
Nice-to-have
Empathetic customer advocacy skills
Strong verbal and written communication
Collaborative cross-team leadership
Proactive risk mitigation mindset
Key Requirements
8+ years professional experience
ITIL v4 or Service Operations certification preferred
SRE Foundation or Practitioner certification preferred
Cloud certifications (AWS, Azure, or GCP) preferred