Act as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team
Job Summary
Act as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team.
Working as part of Agile team to define target state infrastructure architecture of applications from reliability standpoint.
As part of our flexible scheme, here are just some of the benefits that you’ll enjoy Best in class leave policy Gender neutral parental leaves 100% reimbursement under childcare assistance benefit (gender neutral).
Matching Summary
Act as the SME on operation automation and monitoring, identifying TOIL within the teams existing systems and processes, recommending, and implementing automated solutions to reduce TOIL and improve the efficiency and effectiveness of the team.
Skills & Requirements
Must-have
Cloud engineering experience
Operation automation and monitoring SME
Identify and reduce TOIL
CI/CD pipeline setup and management
Infrastructure as Code (Terraform)
Containerization (Docker, Kubernetes)
Windows or Linux/Unix administration
Nice-to-have
ITSM process understanding
Microservices knowledge
AI-driven automation tools
AI-based observability platforms
AI/ML for incident response
Key Requirements
8+ Years of industry experience
Production experience on GCP or any other Public Cloud
GCP Services: Compute, Networking, Security
Container Orchestration
Backup and Recovery methodology on GCP
AI/ML technologies and frameworks
SRE core principles
Configuration management tool experience
Continuous Integration and Continuous Deployment Pipelines experience