Site Reliability Engineer Ii (sre) - Guidewire Cloud Platform (application)
Guidewire Software
Experience with sre principles and sli/slos
Proficiency in aws or kubernetes infrastructure
Strong troubleshooting skills for distributed systems
This role offers a unique opportunity to apply engineering principles to ensure the reliability of the Guidewire Cloud Platform used by hundreds of insurers worldwide
Job Summary
This role offers a unique opportunity to apply engineering principles to ensure the reliability of the Guidewire Cloud Platform used by hundreds of insurers worldwide.
Candidates will participate in mandatory on-call rotations to respond to incidents and alerts outside regular business hours, ensuring service availability.
The position involves developing automated runbooks and optimizing systems to reduce manual tasks while collaborating cross-functionally with development teams.
Matching Summary
This role offers a unique opportunity to apply engineering principles to ensure the reliability of the Guidewire Cloud Platform used by hundreds of insurers worldwide.
Skills & Requirements
Must-have
Experience with SRE principles and SLI/SLOs
Proficiency in AWS or Kubernetes infrastructure
Strong troubleshooting skills for distributed systems
Familiarity with Datadog monitoring tools
Ability to script in Python, Go, Java, or Shell
Nice-to-have
Interest in pursuing SRE or AWS certifications
Experience with SQL and database performance tuning
Knowledge of open-source data processing frameworks
Exposure to CI/CD pipelines like Jenkins or GitHub Actions
Passion for continuous learning and innovation
Key Requirements
Basic understanding of SLIs, SLOs, and Error Budgets
Experience deploying infrastructure using Terraform