Sre Observability & Slo Engineer

GE VERNOVA

3-5 years sre or observability experience
Deep expertise in datadog or grafana prometheus
Hands-on kubernetes eks observability implementation
This role serves as the eyes and ears of the GridOS SRE team by building the full telemetry stack for mission-critical energy management systems

Job Summary

  • This role serves as the eyes and ears of the GridOS SRE team by building the full telemetry stack for mission-critical energy management systems.
  • The engineer will define meaningful Service Level Indicators and Objectives while governing the review cycle to drive reliability work prioritization.
  • Relocation assistance is provided for this high-impact position focused on establishing v1.0 observability coverage across customer environments.

Matching Summary

This role serves as the eyes and ears of the GridOS SRE team by building the full telemetry stack for mission-critical energy management systems.

Skills & Requirements

Must-have

  • 3-5 years SRE or observability experience
  • Deep expertise in Datadog or Grafana Prometheus
  • Hands-on Kubernetes EKS observability implementation
  • Proficiency in PromQL or CloudWatch query languages
  • Experience defining SLIs, SLOs, and error budgets
  • Strong Python or Bash scripting skills

Nice-to-have

  • Familiarity with OpenTelemetry vendor-agnostic instrumentation
  • Experience with AWS CloudWatch Synthetics
  • Knowledge of chaos engineering practices
  • Exposure to AIOps or ML-driven anomaly detection
  • Background in regulated industries like energy or utilities
  • AWS certifications in CloudWatch or Solutions Architect

Key Requirements

  • Bachelor's Degree in Computer Science, STEM, or Information Management
  • 3-5 years of experience in SRE or infrastructure reliability roles

Work Rights

Not specified

Tailored Resume

Cover Letter