Head Of Site Reliability Engineering

AIPL ACQUIRE INTELLIGENCE

Hybrid
Aws production services ownership
Pulumi typescript infrastructure as code
Slo error budget management
This role involves building a Site Reliability Engineering team from the ground up for a mission-critical IoT platform

Job Summary

  • This role involves building a Site Reliability Engineering team from the ground up for a mission-critical IoT platform.
  • The successful candidate will own the reliability of production services running on AWS while steering the roadmap for platform resilience.
  • You will lead a remote team by coaching engineers, setting Service Level Objectives, and enforcing blameless post-mortem processes.

Matching Summary

This role involves building a Site Reliability Engineering team from the ground up for a mission-critical IoT platform.

Skills & Requirements

Must-have

  • AWS production services ownership
  • Pulumi TypeScript Infrastructure as Code
  • SLO error budget management
  • Kubernetes EKS architecture
  • Incident command and post-mortem leadership

Nice-to-have

  • Blameless culture fostering
  • Remote team scaling experience
  • DevSecOps security practices
  • IoT platform background
  • Cost optimization initiatives

Key Requirements

  • Experience leading SRE teams
  • Proficiency in Pulumi and TypeScript
  • Deep knowledge of AWS and Kubernetes

Work Rights

Not specified

Tailored Resume

Cover Letter