Lead a global team of SREs to ensure the stability and performance of data platforms, increasing SRE coverage and technical expertise
Job Summary
Lead a global team of SREs to ensure the stability and performance of data platforms, increasing SRE coverage and technical expertise.
Own the monitoring strategy, manage error budgets and SLOs, and drive standardization of logging and observability across the organization.
Pursue automation to eradicate toil, build confidence in deployments through enhanced data quality assurance, and lead incident management and post-mortems.
Matching Summary
Lead a global team of SREs to ensure the stability and performance of data platforms, increasing SRE coverage and technical expertise.
Skills & Requirements
Must-have
ELK
Prometheus
Grafana
Python-based service development
Linux administration
CI/CD
Airflow
dbt
Snowflake
automated testing
incident management
Nice-to-have
transformational change
modern technologies
systematic problem solving
collaboration
automation mindset
security mindset
coaching and mentorship
Key Requirements
Proficiency in Python-based service development
Linux administration
CI/CD
Experience with data flows using Airflow, dbt and Snowflake
Capability to write and run automated tests
Experience running software projects from ideation through operations
Demonstrated ability to be self-organized and self-driven
Strong communication skills to influence cross-functional partners