You work close to both the code and production environments, designing and building solutions that make our platform measurable, predictable, and resilient
Job Summary
You work close to both the code and production environments, designing and building solutions that make our platform measurable, predictable, and resilient.
You will work primarily from our Pune office, while collaborating closely with the rest of the SRE team in our Dutch office.
You work on a mission-critical SaaS platform with real customer impact and influence how reliability and observability are engineered into software.
Matching Summary
You work close to both the code and production environments, designing and building solutions that make our platform measurable, predictable, and resilient.
Skills & Requirements
Must-have
design and build reliable solutions
define and implement SLI's and SLO's
build meaningful monitoring and alerting
automate away operational risk
experience changing application code
design observability as part of system architecture
Nice-to-have
think like an engineer when things go wrong
question existing reliability practices
use data to drive measurable improvements
collaborate across teams
distributed systems or large SaaS platforms
regulated or compliance-sensitive environments
Key Requirements
5+ years of experience as a Site Reliability Engineer
Strong experience with monitoring, alerting and observability
Proven ability to design and work with SLI’s, SLO’s
Hands-on coding experience
Experience building automation
Experience working with Microsoft Azure or other major public cloud providers