Senior Site Reliability Engineer- Observability

Okta

Bengaluru, India
On-site
Splunk architecture and optimisation
Grafana dashboard design
Terraform for infrastructure automation
This role offers an opportunity to own and evolve Okta's observability ecosystem, moving beyond simple monitoring to architect a comprehensive, scalable telemetry platform

Job Summary

  • This role offers an opportunity to own and evolve Okta's observability ecosystem, moving beyond simple monitoring to architect a comprehensive, scalable telemetry platform.
  • You will be responsible for Splunk optimisation, ensuring the logging architecture is performant, cost-effective, and deeply integrated with automated workflows, treating infrastructure as code using Terraform and coding proficiency.
  • Key responsibilities include leading Splunk and Grafana architecture, automating infrastructure deployment, optimising telemetry data pipelines, developing custom Splunk workflows for automated responses, and participating in on-call rotations.

Matching Summary

This role offers an opportunity to own and evolve Okta's observability ecosystem, moving beyond simple monitoring to architect a comprehensive, scalable telemetry platform.

Skills & Requirements

Must-have

  • Splunk architecture and optimisation
  • Grafana dashboard design
  • Terraform for infrastructure automation
  • Go, Python, or Ruby coding
  • OpenTelemetry instrumentation
  • Linux internals and networking

Nice-to-have

  • Distributed tracing implementation
  • Security observability workflows
  • Cloud native observability tools

Key Requirements

  • Minimum 3+ years SRE/DevOps experience
  • Splunk administration and SPL expertise
  • Actionable Grafana dashboard building
  • High-availability systems focus
  • Kubernetes/EKS experience

Work Rights

Not specified

Tailored Resume

Cover Letter