Senior Site Reliability Engineer

SHEIN

San Diego, California, United States
On-site
Operating large-scale systems
Mission-critical production systems
Kubernetes, kafka, elasticsearch, redis
Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents

Job Summary

  • Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.
  • Own and operate core open-source infrastructure such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper and other large-scale distributed systems.
  • Automate operational workflows and eliminate manual toil through scripting, tooling, and process improvements.

Matching Summary

Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.

Skills & Requirements

Must-have

  • operating large-scale systems
  • mission-critical production systems
  • Kubernetes, Kafka, Elasticsearch, Redis
  • observability solutions (metrics, logs, traces)
  • automate operational workflows
  • Linux, networking, distributed systems

Nice-to-have

  • passion for solving problems
  • operational excellence
  • proactively identify risks

Key Requirements

  • Senior Site Reliability Engineer I
  • deep experience operating and evolving large-scale, mission-critical systems
  • strong software engineering skills
  • solid to deep expertise in Linux, networking, and distributed systems

Work Rights

Not specified

Tailored Resume

Cover Letter