Ensure the reliability, availability, and performance of Unix/Linux platforms by proactively monitoring, optimizing, and improving system stability using SRE methodologies
Job Summary
Ensure the reliability, availability, and performance of Unix/Linux platforms by proactively monitoring, optimizing, and improving system stability using SRE methodologies.
Lead incident response for high‑severity issues, perform root-cause analysis (RCA), and implement permanent fixes to prevent recurrence.
Join a dynamic organisation of 25,000 people across 65 countries that values innovation, quality, and continuous improvement.
Matching Summary
Ensure the reliability, availability, and performance of Unix/Linux platforms by proactively monitoring, optimizing, and improving system stability using SRE methodologies.
Skills & Requirements
Must-have
Unix/Linux platform reliability
Incident response and RCA
Infrastructure automation with Ansible/Terraform
Observability with Datadog/Prometheus
Shell scripting (Bash, ksh)
Python/Perl programming skills
Nice-to-have
Continuous improvement mindset
Collaborative problem-solving
Data-driven decision making
Commitment to sustainability
Key Requirements
Bachelor's degree in Computer Science or related field