Site Reliability Engineering Lead

Truist Bank

United States of America
**
Distributed systems
Container orchestration (kubernetes)
Cloud-native operational tooling
** Truist Bank is seeking a Site Reliability Engineering Lead to enhance the reliability and operational excellence of its critical enterprise platforms. The role requires a hands-on technical leader who can drive improvements in observability, automation, and incident management while collaborating with cross-functional teams. **

Job Summary

  • The Site Reliability Engineering Lead is a senior, hands-on technical leader accountable for elevating the reliability, resiliency, and operational excellence of critical enterprise platforms across hybrid cloud and on-prem environments.
  • This position plays a pivotal role in building and maturing the SRE Center for Enablement (C4E) by contributing standards, repeatable patterns, runbooks, playbooks, and coaching that amplify reliability practices across the enterprise.
  • The role partners closely with Application Development, Infrastructure, Production Support, Platform Delivery, Architecture, Cybersecurity, Risk, and Business technology teams to uplift operational practices and deliver stable, predictable, and scalable services.

Matching Summary

Match Score: 75

** Truist Bank is seeking a Site Reliability Engineering Lead to enhance the reliability and operational excellence of its critical enterprise platforms. The role requires a hands-on technical leader who can drive improvements in observability, automation, and incident management while collaborating with cross-functional teams. **

Skills & Requirements

Must-have

  • distributed systems
  • container orchestration (Kubernetes)
  • cloud-native operational tooling
  • automation and scripting languages
  • observability platforms
  • major incident management

Nice-to-have

  • financial services industry experience
  • chaos engineering
  • hybrid-cloud operational frameworks
  • SRE Center for Enablement

Key Requirements

  • 7+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Infrastructure Operations
  • Proficiency with automation and scripting languages (Python, Go, PowerShell, Ansible)
  • Strong understanding of observability platforms (Splunk, Dynatrace)
  • Proven leadership in major incident management and cross-team technical coordination
  • Strong grasp of networking, Linux/Unix internals

Work Rights

Not specified

Tailored Resume

Cover Letter