Support enterprise infrastructure and application observability by implementing and operating monitoring solutions that enable proactive detection, incident response, and service reliability
Job Summary
Support enterprise infrastructure and application observability by implementing and operating monitoring solutions that enable proactive detection, incident response, and service reliability.
Configure, maintain, and enhance observability platforms such as Zabbix, Playwright, Prometheus, Grafana, and BigPanda across on-premises and cloud environments.
Develop and maintain automation scripts and workflows (e.g., StackStorm, Python, Shell) to improve monitoring deployment, troubleshooting, and operational efficiency.
Matching Summary
Support enterprise infrastructure and application observability by implementing and operating monitoring solutions that enable proactive detection, incident response, and service reliability.
Skills & Requirements
Must-have
Implement and operate monitoring solutions
Configure and maintain observability platforms
Integrate monitoring tools with ITSM
Build and maintain alerts and dashboards
Develop automation scripts using Python/Shell
Troubleshoot monitoring and event management issues
Nice-to-have
Exposure to logging platforms
Familiarity with cloud-native observability
SRE practices
Reliability engineering concepts
Key Requirements
Bachelor’s degree or equivalent practical experience
4–6 years of experience in monitoring/observability
Hands-on experience with Zabbix, Prometheus, Grafana, Playwright, or BigPanda
Experience integrating monitoring with ITSM/event management
Working experience with Python and/or Shell scripting
Understanding of infrastructure, cloud, and application architectures