Site Reliability Engineer (sre)

SHORE Solutions Inc

Service level objectives (slos) management
Pulumi with typescript for iac
Aws eks and msk infrastructure
The Site Reliability Engineer serves as the guardian of production systems ensuring reliability and scalability for an IoT telemetry platform

Job Summary

  • The Site Reliability Engineer serves as the guardian of production systems ensuring reliability and scalability for an IoT telemetry platform.
  • You will define Service Level Objectives, automate operational processes using Pulumi, and lead incident response efforts to maintain critical uptime.
  • The role requires participating in a follow-the-sun on-call rotation providing 24x7 support across AU/NZ, EU/ZA, and MX time zones.

Matching Summary

The Site Reliability Engineer serves as the guardian of production systems ensuring reliability and scalability for an IoT telemetry platform.

Skills & Requirements

Must-have

  • Service Level Objectives (SLOs) management
  • Pulumi with TypeScript for IaC
  • AWS EKS and MSK infrastructure
  • Prometheus, Grafana, PagerDuty monitoring
  • Incident commander and post-mortem leadership

Nice-to-have

  • SOC2 and ISO 27001 compliance experience
  • Follow-the-sun on-call rotation participation
  • SingleStore and MongoDB database knowledge
  • Reducing alert fatigue strategies
  • Global client mission support culture

Key Requirements

  • Experience defining and enforcing SLOs
  • Proficiency in AWS services including EKS and MSK
  • Ability to lead post-mortem processes within 48 hours
  • Knowledge of least-privilege IAM policies
  • Willingness to participate in on-call rotations

Work Rights

Not specified

Tailored Resume

Cover Letter