Manager, Site Reliability Engineering

Dimensional Fund Advisors

Austin, Texas, United States
Fully remote
Elk
Prometheus
Grafana
Lead a global team of SREs to ensure the stability and performance of data platforms, increasing SRE coverage and technical expertise

Job Summary

  • Lead a global team of SREs to ensure the stability and performance of data platforms, increasing SRE coverage and technical expertise.
  • Own the monitoring strategy, manage error budgets and SLOs, and drive standardization of logging and observability across the organization.
  • Pursue automation to eradicate toil, build confidence in deployments through enhanced data quality assurance, and lead incident management and post-mortems.

Matching Summary

Lead a global team of SREs to ensure the stability and performance of data platforms, increasing SRE coverage and technical expertise.

Skills & Requirements

Must-have

  • ELK
  • Prometheus
  • Grafana
  • Python-based service development
  • Linux administration
  • CI/CD
  • Airflow
  • dbt
  • Snowflake
  • automated testing
  • incident management

Nice-to-have

  • transformational change
  • modern technologies
  • systematic problem solving
  • collaboration
  • automation mindset
  • security mindset
  • coaching and mentorship

Key Requirements

  • Proficiency in Python-based service development
  • Linux administration
  • CI/CD
  • Experience with data flows using Airflow, dbt and Snowflake
  • Capability to write and run automated tests
  • Experience running software projects from ideation through operations
  • Demonstrated ability to be self-organized and self-driven
  • Strong communication skills to influence cross-functional partners

Work Rights

Not specified

Tailored Resume

Cover Letter