Site Reliability Engineering Manager

RapidSOS

Boston, MA, US
Base: $185,000 - $215,000; bonus/equity: eligible ...
On-site
7+ years in sre or platform engineering
Kubernetes and aws production experience
Terraform/atlantis infrastructure as code
This role leads the SRE Operations team to transition from NOC-style operations to a more engineering-focused, proactive reliability model

Job Summary

  • This role leads the SRE Operations team to transition from NOC-style operations to a more engineering-focused, proactive reliability model.
  • The manager owns the reliability of Kubernetes clusters and AWS infrastructure while driving product teams to own their own services.
  • Candidates will shape the team's long-term AI strategy for infrastructure and manage reserved instance strategies to control AWS costs.

Matching Summary

This role leads the SRE Operations team to transition from NOC-style operations to a more engineering-focused, proactive reliability model.

Salary

Base: $185,000 - $215,000; Bonus/Equity: Eligible for equity options; Benefits: Competitive salary and benefits package

Skills & Requirements

Must-have

  • 7+ years in SRE or Platform Engineering
  • Kubernetes and AWS production experience
  • Terraform/Atlantis Infrastructure as Code
  • Python scripting and code review skills
  • SLOs, error budgets, and blameless postmortems

Nice-to-have

  • Experience moving teams from reactive to proactive ops
  • AI-driven automation strategy for infrastructure
  • Chaos engineering and failure mode analysis
  • Leadership in high-impact team growth
  • Collaboration with product teams on operational readiness

Key Requirements

  • 7+ years SRE/DevOps experience
  • 2+ years team management responsibility
  • Production Kubernetes and AWS expertise
  • Hands-on Terraform and Python proficiency

Work Rights

Not specified

Tailored Resume

Cover Letter