Senior Site Reliability Engineer - Infrastructure

Kensho

Aws infrastructure management
Kubernetes (eks) operations
Terraform for infrastructure automation
You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services

Job Summary

  • You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services.
  • This role requires deep ownership of production systems, strong troubleshooting skills across infrastructure, Container orchestration systems, networking, and applications, and comfort operating in a 24/7 on call environment.
  • Our benefits include Health & Wellness, Flexible Downtime, Continuous Learning, Invest in Your Future, and Family Friendly Perks.

Matching Summary

You will be responsible for ensuring the reliability, scalability, and security of both business-critical internal systems and external, customer facing services.

Skills & Requirements

Must-have

  • AWS infrastructure management
  • Kubernetes (EKS) operations
  • Terraform for infrastructure automation
  • Python for automation and tooling
  • Incident response and root cause analysis
  • Production system ownership and operation

Nice-to-have

  • Experience with Generative AI and LLMs
  • Contributions to open-source projects
  • Working in regulated environments
  • Familiarity with Kafka or event-driven systems

Key Requirements

  • 6+ years of experience in SRE, DevOps, Platform, or Infrastructure Engineering
  • Strong software engineering background
  • Deep experience with AWS cloud environments
  • Strong hands-on expertise with Kubernetes (EKS preferred)
  • Solid understanding of networking fundamentals
  • Experience with CI/CD pipelines
  • Comfortable conducting code reviews

Work Rights

Not specified

Tailored Resume

Cover Letter