Lead Software Engineer, Devops Domain (bangkok Based, Relocation Provided)

Agoda

Bangkok, Thailand
On-site
Sre platforms and reliability initiatives
Sli/slo-driven engineering
Canary releases and automated rollback
Agoda is seeking a Lead Software Engineer in the DevOps domain to work on-site in Bangkok, Thailand. The role focuses on leading technical initiatives related to Site Reliability Engineering (SRE) and improving system resilience, while promoting best practices across the organization

Job Summary

  • Lead the technical vision, architecture, and execution of new SRE platforms or reliability initiatives, defining and promoting SRE best practices across Agoda’s services.
  • Design, build, and operate reliability platforms including load shedding, business signals monitoring, and safe-deployment automation to reduce blast radius while preserving developer velocity.
  • Advance platform observability and reliability signals using Prometheus and Grafana, balancing actionability, scale, and cost efficiency, while defining reliability roadmaps and OKRs.

Matching Summary

Match Score: 85

Agoda is seeking a Lead Software Engineer in the DevOps domain to work on-site in Bangkok, Thailand. The role focuses on leading technical initiatives related to Site Reliability Engineering (SRE) and improving system resilience, while promoting best practices across the organization.

Skills & Requirements

Must-have

  • SRE platforms and reliability initiatives
  • SLI/SLO-driven engineering
  • Canary releases and automated rollback
  • Kubernetes ecosystem and service mesh
  • Prometheus, Grafana, and observability stacks
  • Incident management lifecycle
  • Distributed systems fundamentals

Nice-to-have

  • Chaos engineering and resilience testing
  • ML-assisted detection
  • Multi-region/multi-DC architectures
  • Scaling org-wide SLO/SRE frameworks

Key Requirements

  • Ownership of architecting, building, and operating production systems
  • Lead and coordinate complex cross-team initiatives
  • Expertise in Go, Python, Rust, or Java
  • Deep hands-on experience with Kubernetes
  • Observability & monitoring expertise
  • Experience with reliability engineering patterns
  • Solid data analysis including SQL
  • Excellent communication and collaboration skills

Work Rights

Not specified

Sponsorship: available

Tailored Resume

Cover Letter