The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform
Job Summary
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
Responsibilities include defining and enforcing SLOs, automating operational processes, and building infrastructure and tooling to enable engineering teams to deploy with confidence.
Join the A-Team and experience the A-Life!
Matching Summary
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
Skills & Requirements
Must-have
Prometheus, Grafana, and PagerDuty
Infrastructure as Code (IaC) with Pulumi
AWS EKS, MSK, SingleStore, MongoDB
Incident response and post-mortem leadership
Security patch pipelines and vulnerability remediation
SOC2 and ISO 27001 compliance support
Nice-to-have
Teamwork and innovation culture
Data-driven decision making
Continuous improvement of on-call experience
Key Requirements
Experience with AWS services
Experience with IaC solutions
Experience with monitoring and alerting tools
Experience with incident response procedures
Experience with security and compliance initiatives