Site Reliability Engineer (infrastructure Applications) - Director P3 - Ets

Morgan Stanley

Not specified; not specified; comprehensive benefi...
7+ years production support experience
Linux/unix command-line troubleshooting
Python or bash automation scripting
This role focuses on owning production reliability for multiple infrastructure applications while driving stability work to reduce noise and speed recovery

Job Summary

  • This role focuses on owning production reliability for multiple infrastructure applications while driving stability work to reduce noise and speed recovery.
  • Candidates must possess strong command-line troubleshooting skills in distributed systems and the ability to write production-ready automation in Python or Bash.
  • The position includes a rotating on-call schedule with periodic weekend coverage and requires calm, structured troubleshooting during high-impact incidents.

Matching Summary

This role focuses on owning production reliability for multiple infrastructure applications while driving stability work to reduce noise and speed recovery.

Salary

Not specified; Not specified; Comprehensive benefits and perks mentioned

Skills & Requirements

Must-have

  • 7+ years production support experience
  • Linux/UNIX command-line troubleshooting
  • Python or Bash automation scripting
  • Distributed systems architecture knowledge
  • Incident response and RCA execution

Nice-to-have

  • Cloud-native deployment and containers
  • Grafana or Splunk observability tools
  • Database administration SQL/NoSQL
  • Autosys or Apache Airflow experience
  • AI-assisted development capabilities

Key Requirements

  • Minimum 7 years of Linux/UNIX production support
  • Strong written communication for technical documentation
  • Working understanding of load balancers and app servers

Work Rights

Not specified

Tailored Resume

Cover Letter