Ensure the reliability, availability, and performance of production systems by implementing best practices in monitoring, alerting, and incident response
Job Summary
Ensure the reliability, availability, and performance of production systems by implementing best practices in monitoring, alerting, and incident response.
Develop and maintain automation tools and scripts to streamline deployment, scaling, and operational tasks.
Work closely with development teams and other stakeholders to ensure that new features and services are designed with reliability and scalability in mind.
Matching Summary
Ensure the reliability, availability, and performance of production systems by implementing best practices in monitoring, alerting, and incident response.
Skills & Requirements
Must-have
System Reliability and Maintenance
Automation and Scripting
Incident Management and Response
Monitoring and Alerting Systems
Performance Optimization
Microservices, Spring boot, Angular JS
Oracle, Unix Shell Scripting
Java, Python scripting
Nice-to-have
Agile and Safe methodologies
Google Cloud experience
Messaging middleware knowledge
Key Requirements
Proven IT experience
Experience in Microservices, Spring boot, Angular JS
Strong working / scripting experience in Oracle, Unix Shell Scripting
Strong Knowledge of Oracle Management, SQL scripts/PL SQL, performance mgmt
Strong understanding of Unix, Linux, and Windows
Strong Experience working on Cloud Technology i.e. Google Cloud (Preferred)
Strong scripting experience in Java, Python and Shell
Solid understanding of messaging middleware like Solace, TIBCO or MQ using JMS
Solid understanding of monitoring systems like ITRS Geneos, Splunk
Amenable to hybrid work model and morning shift (630am to 330pm)