Site Reliability Engineer

Point72

Bengaluru, India
On-site
Linux and windows operating systems
Python or similar scripting languages
Datadog observability and monitoring
Design and implement automated operational workflows to improve system reliability and reduce manual intervention

Job Summary

  • Design and implement automated operational workflows to improve system reliability and reduce manual intervention.
  • Build and maintain observability solutions using tools such as Datadog, to deliver metrics, monitoring, alerting, and dashboards.
  • Partner with development teams to improve application reliability, deployment safety, and performance through SRE best practices.

Matching Summary

Design and implement automated operational workflows to improve system reliability and reduce manual intervention.

Skills & Requirements

Must-have

  • Linux and Windows operating systems
  • Python or similar scripting languages
  • Datadog observability and monitoring
  • CI/CD pipelines and deployment automation
  • SQL Server and MongoDB operational knowledge
  • Cloud platforms (AWS or similar)
  • Kubernetes, OpenShift, containerized workloads

Nice-to-have

  • Investor-led culture
  • Professional development encouragement
  • Intellectual curiosity
  • End-to-end system reliability
  • Self-service operational patterns

Key Requirements

  • Hands-on experience with Linux/Windows
  • Experience building automation with Python
  • Experience with CI/CD tools
  • Experience with SQL Server/MongoDB
  • Familiarity with cloud platforms
  • Experience with Kubernetes/OpenShift
  • Experience with infrastructure-as-code tools

Work Rights

Not specified

Tailored Resume

Cover Letter