Software Engineer, Site Reliability

FAL

San Francisco, CA, United States
Base: $180,000-250,000; equity: + equity; benefits...
On-site
5+ years managing critical production systems
Kubernetes infrastructure at scale
Ci/cd pipelines and gitops workflows
The role involves owning the reliability and availability of customer-facing systems from Kubernetes clusters to deployment pipelines

Job Summary

  • The role involves owning the reliability and availability of customer-facing systems from Kubernetes clusters to deployment pipelines.
  • Candidates are expected to leverage AI to automate analysis and resolution of production issues while improving software development speed.
  • The company offers competitive compensation ranging from $180,000 to $250,000 plus equity and relocation assistance for San Francisco hires.

Matching Summary

The role involves owning the reliability and availability of customer-facing systems from Kubernetes clusters to deployment pipelines.

Salary

Base: $180,000-250,000; Equity: Plus equity; Benefits: Health, dental, vision insurance

Skills & Requirements

Must-have

  • 5+ years managing critical production systems
  • Kubernetes infrastructure at scale
  • CI/CD pipelines and GitOps workflows
  • Linux networking and container networking
  • Python and Go or Bash proficiency
  • Prometheus, Grafana, and monitoring tools

Nice-to-have

  • GPU and AI/ML workload management
  • Kernel-based monitoring with eBPF
  • Security tooling experience like Falco
  • Bare metal Kubernetes networking
  • Distributed storage systems knowledge

Key Requirements

  • 5+ years experience in production systems
  • Strong Kubernetes operational experience
  • Proficiency in Python and Go or Bash

Work Rights

Not specified

Tailored Resume

Cover Letter