This role involves building and scaling critical infrastructure across global data centers, multiple cloud platforms, and on-premise systems to drive stability at scale
Job Summary
This role involves building and scaling critical infrastructure across global data centers, multiple cloud platforms, and on-premise systems to drive stability at scale.
You will implement automation for self-healing, fault-tolerant infrastructure using declarative configurations and event-driven workflows while developing internal tools to eliminate repetitive tasks.
The position requires participating in an on-call rotation, incident reviews, root cause identification, and Root Cause Analysis (RCA) reporting to ensure the highest level of uptime.
Matching Summary
This role involves building and scaling critical infrastructure across global data centers, multiple cloud platforms, and on-premise systems to drive stability at scale.
Skills & Requirements
Must-have
3+ years distributed cloud experience
Kubernetes administration and troubleshooting
Go or Python software development
Terraform and Ansible automation
Linux networking and kernel debugging
Nice-to-have
Creative problem-solving skills
Collaborative team culture contribution
Experience with multiple public clouds
On-premise system management
Key Requirements
Bachelor's degree in Computer Science or relevant education
At least 3 years of experience managing distributed cloud environments
Deep expertise in container orchestration with Kubernetes