Staff Machine Learning Engineer, Ml Infrastructure
Unity Technologies
Mountain View, CA, United States
Base: $209,700 - $283,800 usd; bonus/equity: not s...
On-site
Strong experience building large-scale ml pipelines
Experience with ray data and ray train frameworks
Deep experience designing production-grade data pipelines
The role focuses on designing and evolving a large-scale offline ML platform that powers insight, experimentation, and AI-driven decision-making across the company
Job Summary
The role focuses on designing and evolving a large-scale offline ML platform that powers insight, experimentation, and AI-driven decision-making across the company.
You will develop infrastructure supporting distributed training workflows using technologies such as Pytorch, Ray Data, and Ray Train to handle growing data volumes.
Unity Technologies offers comprehensive benefits including health insurance, employee stock ownership, competitive retirement plans, and generous vacation days.
Matching Summary
The role focuses on designing and evolving a large-scale offline ML platform that powers insight, experimentation, and AI-driven decision-making across the company.
Salary
Base: $209,700 - $283,800 USD; Bonus/Equity: Not specified; Benefits: Comprehensive health, life, disability insurance, employee stock ownership, and retirement plans
Skills & Requirements
Must-have
Strong experience building large-scale ML pipelines
Experience with Ray Data and Ray Train frameworks
Deep experience designing production-grade data pipelines
Strong programming skills in Python for distributed workloads
Experience integrating ML pipelines with workflow orchestration systems
Nice-to-have
Familiarity with Spark or Flink distributed computing frameworks
Experience with modern data lakes and warehouses
Ability to lead technical direction without formal authority
Systems thinking regarding performance and cost tradeoffs
Key Requirements
Proven ability to lead technical direction and influence architectural decisions
Strong systems thinking with reasoning about scalability and reliability
Experience working with large-scale distributed compute systems