The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform
Job Summary
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
By implementing comprehensive monitoring, incident response procedures, and reliability practices, you will play a pivotal role in maintaining the uptime and data freshness that our customers depend on for their critical fleet operations.
Participate in follow-the-sun on-call rotation with one week primary/secondary commitment every five weeks, providing 24x7 support coverage across AU/NZ, EU/ZA, and MX time zones.
Matching Summary
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
Skills & Requirements
Must-have
SLO Management
Infrastructure Automation
Incident Response
Security & Compliance
Prometheus, Grafana, PagerDuty
Pulumi with TypeScript
AWS EKS, MSK, SingleStore, MongoDB S3
Nice-to-have
Teamwork and innovation focus
Data-driven decision making
Continuous improvement mindset
Key Requirements
Experience with IaC solutions
Experience managing AWS services
Experience with incident response leadership
Experience with security and compliance initiatives