Lead Software Engineer, Devops Platform (bangkok Based, Relocation Provided)

Agoda

Bangkok, Thailand
On-site
Sre platform architecture and execution
Sli/slo-driven engineering
Kubernetes ecosystem and service mesh
Lead the technical vision, architecture, and execution of new SRE platforms or reliability initiatives

Job Summary

  • Lead the technical vision, architecture, and execution of new SRE platforms or reliability initiatives.
  • Design, build, and operate reliability platforms including load shedding, business signals monitoring, and safe-deployment automation.
  • Maintain and evolve incident, observability, alerting, and on-call tooling, improving signal quality and reducing time-to-mitigation.

Matching Summary

Lead the technical vision, architecture, and execution of new SRE platforms or reliability initiatives.

Skills & Requirements

Must-have

  • SRE platform architecture and execution
  • SLI/SLO-driven engineering
  • Kubernetes ecosystem and service mesh
  • Prometheus and Grafana observability
  • Incident management lifecycle
  • Canary deployments and automated rollback

Nice-to-have

  • Chaos engineering and resilience testing
  • ML-assisted detection
  • Multi-region/multi-DC architectures
  • Scaling org-wide SLO/SRE frameworks

Key Requirements

  • 8+ years of relevant experience
  • Architecting, building, and operating production systems
  • Leading complex cross-team initiatives
  • Expertise in Go, Python, Rust, or Java
  • Hands-on Kubernetes and service mesh experience
  • Observability and monitoring expertise
  • Strong incident management skills
  • Experience with reliability engineering patterns
  • Solid data analysis including SQL
  • Excellent communication and collaboration skills

Work Rights

Not specified

Sponsorship: available

Tailored Resume

Cover Letter