Performance profiling and debugging of ml workloads
This role focuses on enabling and optimizing foundational models like LLMs and Diffusion within key frameworks such as vLLM and MaxText
Job Summary
This role focuses on enabling and optimizing foundational models like LLMs and Diffusion within key frameworks such as vLLM and MaxText.
Engineers will partner directly with customers to measure AI/ML model performance and resolve technical bottlenecks on Google Cloud infrastructure.
The position requires conducting performance profiling, debugging training and inference workloads, and collaborating with internal teams to enhance support for demanding AI workloads.
Matching Summary
This role focuses on enabling and optimizing foundational models like LLMs and Diffusion within key frameworks such as vLLM and MaxText.
Skills & Requirements
Must-have
Foundational model optimization (LLMs, Diffusion)
vLLM, MaxText, MaxDiffusion frameworks
Performance profiling and debugging of ML workloads
Customer partnership for AI/ML performance measurement
Root cause analysis for system issues
Nice-to-have
Full-stack versatility and leadership qualities
Experience with distributed computing systems
Strong communication skills for customer collaboration
Ability to drive product improvements and bug fixes
Enthusiasm for solving new technical problems
Key Requirements
Experience with large-scale system design
Proficiency in artificial intelligence and machine learning
Background in natural language processing or data storage