Senior Ai/ml Capacity And Performance Engineer

General Motors

Sunnyvale, California, US
$144,700 to $261,300; bonus potential: incentive p...
Hybrid
Ml infrastructure strategy
Large-scale ml training and inference
Python and pytorch ecosystem
The mission of the AVCPE team is to provide input into large scale ML infrastructure strategy, advise on key decisions affecting our cloud budget, identify and execute optimization projects, and provide capacity planning and engineering expertise to support GM’s efforts in developing autonomous vehicles (AV)

Job Summary

  • The mission of the AVCPE team is to provide input into large scale ML infrastructure strategy, advise on key decisions affecting our cloud budget, identify and execute optimization projects, and provide capacity planning and engineering expertise to support GM’s efforts in developing autonomous vehicles (AV).
  • Conduct deep-dive analyses of production workloads to identify bottlenecks and propose high-impact optimization strategies.
  • GM offers a variety of health and wellbeing benefit programs, including medical, dental, vision, retirement savings plan, and paid vacation & holidays.

Matching Summary

The mission of the AVCPE team is to provide input into large scale ML infrastructure strategy, advise on key decisions affecting our cloud budget, identify and execute optimization projects, and provide capacity planning and engineering expertise to support GM’s efforts in developing autonomous vehicles (AV).

Salary

$144,700 to $261,300; Bonus Potential: Incentive pay program; Benefits: Health and wellbeing benefit programs

Skills & Requirements

Must-have

  • ML infrastructure strategy
  • large-scale ML training and inference
  • Python and PyTorch ecosystem
  • Kubernetes for orchestrating workloads
  • Nvidia DCGM and Grafana
  • AWS, GCP, or Azure

Nice-to-have

  • Enterprise-grade Nvidia GPU architectures
  • deploying open-source models
  • BigQuery for data analysis
  • Nvidia Nsight for performance tuning
  • translate complex infrastructure needs

Key Requirements

  • 5+ years of professional experience
  • Bachelor’s Degree in Computer Science
  • Expert-level coding skills in Python
  • Resolving performance issues in large-scale distributed environments
  • Deep understanding of distributed systems and ML system design
  • Hands-on experience with Kubernetes
  • Technical proficiency with Nvidia DCGM, nvidia-smi, and Grafana
  • Extensive experience with major cloud ecosystems

Work Rights

Not specified

Tailored Resume

Cover Letter