Build software solutions and systems to manage platform infrastructure and applications, partnering with development teams to improve services through rigorous testing and release procedures
Job Summary
Build software solutions and systems to manage platform infrastructure and applications, partnering with development teams to improve services through rigorous testing and release procedures.
Improve reliability, quality, and time-to-market of our suite of software solutions by building monitoring that alerts on symptoms rather than on outages and running the production environment.
Balance feature development speed and reliability with well-defined service level objectives, partnering with stakeholders to design and deliver a reliable, scalable, secure, and performant platform.
Matching Summary
Build software solutions and systems to manage platform infrastructure and applications, partnering with development teams to improve services through rigorous testing and release procedures.
Skills & Requirements
Must-have
Public Cloud (AWS) hands-on experience
Infrastructure as Code (Terraform)
Kubernetes (EKS) deployment and management
Observability and Monitoring tools
Scripting and Automation (Python, PowerShell, Bash)
Windows and Linux environments
DevOps and CI/CD practices
Nice-to-have
ServiceNow for ticket and incident management
Harness.io for CI/CD deployments
Microsoft Azure services exposure
AWS or Azure certifications
AWS Lambda experience
PostgreSQL administration or development
Capital Markets and financial services understanding
IT event correlation and analysis software
Disaster Recovery/Business Continuity planning
Leadership and mentoring junior staff
Key Requirements
5+ years of experience in IT operations, infrastructure management, or related technical roles
Public Cloud (AWS) hands-on experience
Infrastructure as Code (Terraform) strong experience
Kubernetes (EKS) deployment and management experience
Proficiency with monitoring tools (CloudWatch, Grafana, Prometheus, Splunk)
Automation using Python, PowerShell, and Bash
Solid experience with Windows and Linux environments
Working knowledge of DevOps and CI/CD pipelines
Strong troubleshooting skills for production environments
Skilled in diagnosing and resolving application and infrastructure failures
Ability to create technical documentation and communicate effectively