Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents
Job Summary
Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.
Own and operate core open-source infrastructure such as APISIX, Nginx, Kubernetes, Kafka, Elasticsearch, Redis, Consul, Etcd, Zookeeper and other large-scale distributed systems.
Automate operational workflows and eliminate manual toil through scripting, tooling, and process improvements.
Matching Summary
Keep SHEIN’s mission-critical production systems running 24/7/365, participating in on-call rotations and acting decisively during incidents.
Skills & Requirements
Must-have
operating large-scale systems
mission-critical production systems
Kubernetes, Kafka, Elasticsearch, Redis
observability solutions (metrics, logs, traces)
automate operational workflows
Linux, networking, distributed systems
Nice-to-have
passion for solving problems
operational excellence
proactively identify risks
Key Requirements
Senior Site Reliability Engineer I
deep experience operating and evolving large-scale, mission-critical systems
strong software engineering skills
solid to deep expertise in Linux, networking, and distributed systems