Apply software engineering techniques, automation, and best practices in incident response to ensure the reliability, availability, and scalability of systems, platforms, and technology
Job Summary
Apply software engineering techniques, automation, and best practices in incident response to ensure the reliability, availability, and scalability of systems, platforms, and technology.
Develop tools and scripts to automate operational processes, reducing manual workload, increasing efficiency, and improving system resilience.
Collaborate with development teams to integrate best practices for reliability, scalability, and performance into the software development lifecycle.
Matching Summary
Apply software engineering techniques, automation, and best practices in incident response to ensure the reliability, availability, and scalability of systems, platforms, and technology.
Skills & Requirements
Must-have
GCP expertise
Python and Terraform proficiency
CI/CD Pipelines in Gitlab
GCP Landing zones configuration
Cloud deployment and monitoring tools
Nice-to-have
ITIL Framework familiarity
AWS/Azure working knowledge
Technical documentation writing
Cross-functional collaboration skills
Creative problem-solving
Key Requirements
Google Certified Professional and/or Associate certifications
Hands-on coding experience in Python and Terraform (latest 2 years)
Experience with CI/CD Pipelines in Gitlab
Experience with GCP Landing zones configuration and troubleshooting
Ability to identify root causes of instability in high-traffic systems