Senior Site Reliability Engineer

WorldQuant

Hanoi, Vietnam
On-site
Python scripting and automation
Kubernetes and docker expertise
Grafana, elk stack, vector observability
Design and develop automation, monitoring, CI/CD, and reliability features for the data onboarding pipeline

Job Summary

  • Design and develop automation, monitoring, CI/CD, and reliability features for the data onboarding pipeline.
  • Build observability solutions using Grafana, the ELK stack, and Vector.
  • Participate in on-call rotation, respond to production incidents, and drive post-mortems.

Matching Summary

Design and develop automation, monitoring, CI/CD, and reliability features for the data onboarding pipeline.

Skills & Requirements

Must-have

  • Python scripting and automation
  • Kubernetes and Docker expertise
  • Grafana, ELK stack, Vector observability
  • CI/CD pipeline design
  • Infrastructure-as-code (Ansible)
  • Linux system administration
  • Message queues (Kafka, Redis, Celery)

Nice-to-have

  • Collaborate with engineering, analyst, and research teams
  • Continuous improvement attitude
  • Intellectual horsepower and outstanding talent

Key Requirements

  • 8+ years of experience in SRE, DevOps, or platform engineering
  • Linux expertise
  • Python proficiency
  • Kubernetes & containers experience
  • Observability stack experience
  • CI/CD & infrastructure-as-code experience
  • Database working knowledge
  • Message queues & streaming experience
  • Networking & APIs understanding
  • Incident management experience

Work Rights

Not specified

Tailored Resume

Cover Letter