The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform
Job Summary
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
Responsibilities include defining and enforcing SLOs, automating operational processes, and building infrastructure and tooling to enable engineering teams to deploy with confidence.
The role involves participating in a follow-the-sun on-call rotation providing 24x7 support coverage across multiple time zones.
Matching Summary
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
Skills & Requirements
Must-have
Prometheus, Grafana, PagerDuty monitoring
Infrastructure as Code with Pulumi
AWS EKS, MSK, SingleStore, MongoDB
Incident response and post-mortem leadership
Security patch pipelines and vulnerability remediation