**
Confluent is seeking a Staff Site Reliability Engineer specializing in Incident Management & Reliability to join their remote team in Canada. The ideal candidate will have extensive experience in SRE and incident management, particularly in multi-cloud environments, and will play a key role in driving proactive reliability improvements.
**
Job Summary
We need an expert-level engineer who can drive proactive reliability improvements that prevent these incidents before they occur.
This role combines hands-on technical work with strategic program ownership, with roughly 75% of your time on engineering and 25% on teaching and coordination.
You'll be part of a global team with follow-the-sun coverage, with clean handoffs that keep everyone working sustainable hours.
Matching Summary
Match Score: 75
**
Confluent is seeking a Staff Site Reliability Engineer specializing in Incident Management & Reliability to join their remote team in Canada. The ideal candidate will have extensive experience in SRE and incident management, particularly in multi-cloud environments, and will play a key role in driving proactive reliability improvements.
**
Skills & Requirements
Must-have
Incident management tooling expertise
Distributed systems and failure modes
Observability: metrics, logging, tracing
Kubernetes and container orchestration
CI/CD pipelines and release processes
Customer-facing incident document editing
Nice-to-have
Kafka/event streaming expertise
Driving org-wide process change
Cultural change initiatives
Key Requirements
10+ years of relevant experience
Cloud experience (AWS, GCP, or Azure)
Experience navigating reliability/incident programs at 500+ engineer organizations