Site Reliability Engineer (sre)

acquire.ai

Taguig City, Philippines
On-site
Define and enforce slos
Infrastructure as code with pulumi
Aws eks, msk, singlestore, mongodb
Acquire.ai is seeking a Site Reliability Engineer to ensure the reliability and performance of their IoT telemetry platform. The role involves defining Service Level Objectives, automating processes, and leading incident response efforts

Job Summary

  • The Site Reliability Engineer serves as the guardian of our production systems, ensuring the reliability, scalability, and performance of our IoT telemetry platform.
  • Responsibilities include defining and enforcing SLOs, automating operational processes, designing IaC solutions with Pulumi, managing AWS services, and leading incident response.
  • Participate in a follow-the-sun on-call rotation to provide 24x7 support across multiple time zones and ensure operational excellence.

Matching Summary

Match Score: 85

Acquire.ai is seeking a Site Reliability Engineer to ensure the reliability and performance of their IoT telemetry platform. The role involves defining Service Level Objectives, automating processes, and leading incident response efforts.

Skills & Requirements

Must-have

  • Define and enforce SLOs
  • Infrastructure as Code with Pulumi
  • AWS EKS, MSK, SingleStore, MongoDB
  • Prometheus, Grafana, PagerDuty monitoring
  • Incident commander role
  • Security and compliance support

Nice-to-have

  • Automate operational processes
  • Continuous improvement of on-call experience
  • Data-driven deployment decisions

Key Requirements

  • Experience with AWS services
  • Proficiency in Infrastructure as Code
  • Experience with monitoring tools
  • Experience with incident response

Work Rights

Not specified

Tailored Resume

Cover Letter