As a foundational member of the reliability function within the Data Engineering organization, you will ensure the availability, performance, and resilience of high-throughput telemetry and analytics platforms
Job Summary
As a foundational member of the reliability function within the Data Engineering organization, you will ensure the availability, performance, and resilience of high-throughput telemetry and analytics platforms.
You will play a key role in designing systems where resilience, automation, and observability are built in from the start.
We are looking for engineers who are uncomfortable with manual toil and are driven to build platforms where scaling, recovery, and operational insight are inherent properties of the system architecture.
Matching Summary
As a foundational member of the reliability function within the Data Engineering organization, you will ensure the availability, performance, and resilience of high-throughput telemetry and analytics platforms.
Skills & Requirements
Must-have
Site Reliability Engineering
DevOps
Platform Engineering
Linux systems administration
cloud-native infrastructure
Kafka, Flink, Spark
Terraform
Prometheus, Grafana, OpenTelemetry
Nice-to-have
data engineering platforms
Kubernetes
stream processing frameworks
real-time telemetry
multi-cloud or hybrid cloud
Key Requirements
Proven experience in SRE, DevOps, or Platform Engineering
Experience operating high-throughput data platforms or streaming systems
Hands-on experience with Infrastructure as Code tools