Senior Site Reliability Engineer

Stabilityai

United States
On-site
Aws cloud environment management
Terraform infrastructure as code
Kubernetes container scaling
Stability AI is seeking a Senior Site Reliability Engineer to enhance its cloud infrastructure, collaborating with various teams to ensure system reliability and innovation. The ideal candidate should possess strong skills in cloud architecture, infrastructure as code, and incident management, with a focus on AWS and related technologies

Job Summary

  • Stability AI is seeking a Senior Site Reliability Engineer to shape and improve their evolving cloud infrastructure.
  • The role involves architecting scalable systems in AWS with a focus on high availability and resilience.
  • Candidates will collaborate across engineering, IT, and security teams to drive innovation and enforce SRE best practices.

Matching Summary

Match Score: 85

Stability AI is seeking a Senior Site Reliability Engineer to enhance its cloud infrastructure, collaborating with various teams to ensure system reliability and innovation. The ideal candidate should possess strong skills in cloud architecture, infrastructure as code, and incident management, with a focus on AWS and related technologies.

Skills & Requirements

Must-have

  • AWS cloud environment management
  • Terraform infrastructure as code
  • Kubernetes container scaling
  • Grafana ELK stack monitoring
  • CI/CD pipeline enhancement

Nice-to-have

  • Mentoring junior team members
  • Championing SRE principles
  • Driving incident root cause analysis
  • Cloud security experience background

Key Requirements

  • Experience scaling resource intensive systems
  • Background in software development or automation scripting
  • Knowledge of Kubernetes or container solutions

Work Rights

Not specified

Tailored Resume

Cover Letter