Senior Site Reliability Engineer, Production Engineering

Anduril

Costa Mesa, United States
$166,000—$220,000 usd py
On-site
Kubernetes production environments
Observability stacks (prometheus, grafana)
Cloud platforms (aws, azure, gcp)
Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform

Job Summary

  • Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform.
  • Drive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issues.
  • Implement security best practices and compliance controls for production environments handling sensitive defense data.

Matching Summary

Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform.

Salary

$166,000—$220,000 USD

Skills & Requirements

Must-have

  • Kubernetes production environments
  • Observability stacks (Prometheus, Grafana)
  • Cloud platforms (AWS, Azure, GCP)
  • Infrastructure as code
  • Incident response and postmortems
  • System architecture for reliability

Nice-to-have

  • Defense or mission-critical systems
  • Chaos engineering principles
  • Service mesh technologies
  • Database operations and optimization
  • CI/CD platforms and deployment automation

Key Requirements

  • 7+ years engineering experience
  • 3+ years SRE/production operations experience
  • Bachelor's degree in CS or equivalent
  • U.S. Person status
  • Eligible for U.S. Secret clearance

Work Rights

Must be a U.S. Person

Tailored Resume

Cover Letter