You’ll be a technical leader on a team responsible for improving system reliability, reducing operational toil, and establishing best practices across engineering
Job Summary
You’ll be a technical leader on a team responsible for improving system reliability, reducing operational toil, and establishing best practices across engineering.
Design and develop observability systems (metrics, logging, tracing, alerting) that produce actionable alerts and data with minimal noise.
This is an early position in the company's SRE function. You will have direct input into how reliability standards and practices are established, which forms the foundation on which product engineering builds.
Matching Summary
You’ll be a technical leader on a team responsible for improving system reliability, reducing operational toil, and establishing best practices across engineering.
Salary
$100,000 - $120,000
Skills & Requirements
Must-have
improve system reliability
reduce operational toil
establish best practices
design observability systems
lead complex incident response
eliminate toil through automation
build reusable systems
Nice-to-have
scaling SRE practices
Kubernetes/container orchestration
Infrastructure as Code
high-growth or scaling systems
performance engineering or capacity planning
Key Requirements
6 - 10+ years of experience in SRE, infrastructure, or backend systems engineering
owning reliability outcomes for complex, distributed systems
cloud infrastructure (AWS, GCP, or Azure)
production-scale systems
observability, incident management, and system performance
Proficiency in at least one programming language (e.g., Go, Python, Java)
change how other teams work without managerial authority