Site Reliability Engineer (sre)

AIPL ACQUIRE INTELLIGENCE

Taguig City, Philippines
Slo management and enforcement
Infrastructure as code with pulumi
Aws eks, msk, singlestore, mongodb
The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform

Job Summary

  • The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
  • By implementing comprehensive monitoring, incident response procedures, and reliability practices, you will play a pivotal role in maintaining the uptime and data freshness that our customers depend on for their critical fleet operations.
  • Participate in follow-the-sun on-call rotation with one week primary/secondary commitment every five weeks, providing 24×7 support coverage across AU/NZ, EU/ZA, and MX time zones.

Matching Summary

The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.

Skills & Requirements

Must-have

  • SLO Management and enforcement
  • Infrastructure as Code with Pulumi
  • AWS EKS, MSK, SingleStore, MongoDB
  • Incident response and post-mortems
  • Security and compliance automation
  • Prometheus, Grafana, PagerDuty monitoring

Nice-to-have

  • Teamwork and innovation focus
  • Continuous improvement mindset
  • Data-driven decision making

Key Requirements

  • Must have experience with IaC solutions
  • Must have experience with AWS services
  • Must have experience with monitoring tools
  • Must participate in on-call rotation

Work Rights

Not specified

Tailored Resume

Cover Letter