We are seeking an experienced Site Reliability Engineer (SRE) with strong DevOps and automation expertise to ensure the reliability, scalability, and performance of distributed systems
Job Summary
We are seeking an experienced Site Reliability Engineer (SRE) with strong DevOps and automation expertise to ensure the reliability, scalability, and performance of distributed systems.
You will play a critical role in building and maintaining monitoring platforms, automating operational processes, and improving system reliability across multiple application domains.
This role focuses on CI/CD automation, monitoring, observability, and system troubleshooting across cloud-native and Kubernetes-based environments.
Matching Summary
We are seeking an experienced Site Reliability Engineer (SRE) with strong DevOps and automation expertise to ensure the reliability, scalability, and performance of distributed systems.
Skills & Requirements
Must-have
Site Reliability Engineering and DevOps expertise
CI/CD pipeline automation and deployment
Prometheus and Grafana Kubernetes monitoring
Python, Groovy, and Shell scripting
OpenTelemetry observability implementation
Kubernetes and cloud-native operations
Nice-to-have
Collaboration with engineering and platform teams
Observability and monitoring best practices
Automation-first mindset
Key Requirements
Hands-on experience in Site Reliability Engineering and DevOps roles
Experience managing Prometheus and Grafana for Kubernetes
Strong scripting skills in Python, Groovy, and Shell
Experience with OpenTelemetry for distributed tracing