Site Reliability Engineer, Avp

Deutsche Bank

Pune, India
Cloud engineering experience
Operation automation and monitoring
Identify and reduce toil
Act as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team

Job Summary

  • Act as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team.
  • Working as part of Agile team to define target state infrastructure architecture of applications from reliability standpoint.
  • Develop, improve, and maintain internal operations tools, such as deployment, monitoring, statistics, platform management tools, etc.

Matching Summary

Act as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team.

Skills & Requirements

Must-have

  • Cloud engineering experience
  • Operation automation and monitoring
  • Identify and reduce TOIL
  • GCP or other Public Cloud production experience
  • Containerization (Docker, Kubernetes)
  • Infrastructure as Code (Terraform)
  • Configuration management tool experience

Nice-to-have

  • ITSM process understanding
  • Microservices knowledge
  • AI/ML for operational efficiency
  • AI-based observability platforms
  • AI/ML for incident response

Key Requirements

  • 8+ Years of industry experience
  • GCP Services expertise
  • Container Orchestration knowledge
  • Windows or Linux/Unix administration
  • Unix shell scripting (bash)
  • CI/CD Pipelines and tooling experience

Work Rights

Not specified

Tailored Resume

Cover Letter