You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services
Job Summary
You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services.
This role requires deep ownership of production systems, strong troubleshooting skills across infrastructure, Container orchestration systems, networking, and applications, and comfort operating in a 24/7 on call environment.
Our benefits include Health & Wellness, Flexible Downtime, Continuous Learning, Invest in Your Future, and Family Friendly Perks.
Matching Summary
You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services.
Skills & Requirements
Must-have
AWS infrastructure management
Kubernetes (EKS) operations
Terraform for infrastructure automation
Python for automation and tooling
Incident response and root cause analysis
Production system ownership and operation
Nice-to-have
Experience with Generative AI and LLMs
Contributions to open-source projects
Working in regulated environments
Familiarity with Kafka or event-driven systems
Key Requirements
6+ years of experience in SRE, DevOps, Platform, or Infrastructure Engineering
Strong software engineering background
Deep experience with AWS cloud environments
Strong hands-on expertise with Kubernetes (EKS preferred)