Senior Site Reliability Engineer, Data Infrastructure

CoreWeave Inc

New York, NY, US
$165,000 to $242,000; discretionary bonus, equity ...
On-site
Kubernetes and containerized services
Ci/cd systems (argo cd, github actions)
Production systems high availability
The Platform & Infrastructure Engineering team is responsible for the reliability, scalability, and security of the company’s data platform, building and operating foundational systems for data ingestion, transformation, analytics, and AI workloads

Job Summary

  • The Platform & Infrastructure Engineering team is responsible for the reliability, scalability, and security of the company’s data platform, building and operating foundational systems for data ingestion, transformation, analytics, and AI workloads.
  • As a Senior Site Reliability Engineer, you will own the reliability and performance of our Kubernetes-based data platform, designing and operating highly available, multi-region systems.
  • CoreWeave offers a comprehensive benefits program including 100% paid medical, dental, and vision insurance, a discretionary bonus, equity awards, and a 401(k) with a generous employer match.

Matching Summary

The Platform & Infrastructure Engineering team is responsible for the reliability, scalability, and security of the company’s data platform, building and operating foundational systems for data ingestion, transformation, analytics, and AI workloads.

Salary

$165,000 to $242,000; Discretionary bonus, equity awards; Medical, dental, vision, life insurance, 401(k) match, PTO

Skills & Requirements

Must-have

  • Kubernetes and containerized services
  • CI/CD systems (Argo CD, GitHub Actions)
  • Production systems high availability
  • Geo-replicated multi-region systems
  • Observability (Prometheus, Grafana, OpenTelemetry)
  • Infrastructure as code (Helm, Terraform)

Nice-to-have

  • Operating data platforms
  • Service mesh technologies
  • Building internal developer platforms
  • Curious about system resilience
  • Diagnosing complex distributed systems

Key Requirements

  • 5+ years experience in SRE/Platform/Infrastructure
  • High availability requirements (≥99.99% uptime)
  • Experience with automated environment provisioning
  • Strong understanding of system performance tuning
  • Experience implementing security best practices

Work Rights

Not specified

Tailored Resume

Cover Letter