Staff Site Reliability Engineer – Automation And Platform

Cerebras Systems

Remote
Remote
Self-service delivery pipelines
Shared observability common tooling
Gitops-driven cd
As a Staff SRE, you will lead the engineering effort to eliminate toil at scale by driving implementation of self-service delivery pipelines, shared observability common tooling

Job Summary

  • As a Staff SRE, you will lead the engineering effort to eliminate toil at scale by driving implementation of self-service delivery pipelines, shared observability common tooling.
  • Your primary focus shifts to architecting and delivering the "tomorrow" layer: declarative GitOps-driven CD for model releases, capacity provisioning and cluster upgrades.
  • This work will shift reliability from an ops-only burden to a shared engineering discipline that underpins frontier AI inference at scale.

Matching Summary

As a Staff SRE, you will lead the engineering effort to eliminate toil at scale by driving implementation of self-service delivery pipelines, shared observability common tooling.

Skills & Requirements

Must-have

  • self-service delivery pipelines
  • shared observability common tooling
  • GitOps-driven CD
  • capacity provisioning
  • cluster upgrades
  • reliability guarantees
  • automation of toil

Nice-to-have

  • mentoring early-career SREs
  • transforming complexity into reliability
  • shared engineering discipline

Key Requirements

  • Staff+ engineer experience
  • Hands-on operational immersion
  • No 24/7 on-call rotations

Work Rights

Not specified

Tailored Resume

Cover Letter