Senior Site Reliability Engineer, Production Engineering
Anduril
Costa Mesa, United States
$166,000—$220,000 usd py
On-site
Kubernetes production environments
Observability stacks (prometheus, grafana)
Cloud platforms (aws, azure, gcp)
Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform
Job Summary
Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform.
Drive incident response and conduct blameless postmortems to identify systemic improvements and prevent recurrence of production issues.
Implement security best practices and compliance controls for production environments handling sensitive defense data.
Matching Summary
Design and implement comprehensive monitoring, observability, and alerting systems to ensure early detection of reliability issues across the Lattice platform.