NVIDIA is seeking a motivated AI Acceleration & Optimization Engineer to improve the performance, scalability, and efficiency of modern AI models across NVIDIA GPU platforms
Job Summary
NVIDIA is seeking a motivated AI Acceleration & Optimization Engineer to improve the performance, scalability, and efficiency of modern AI models across NVIDIA GPU platforms.
The role involves collaborating with multiple teams including architecture, research, CUDA, compiler, and framework teams to bring next-generation AI workloads from research to production with strong performance and reliability.
Responsibilities include optimizing AI models such as LLMs, VLMs, diffusion models, profiling workloads, supporting optimization techniques, and contributing to benchmarking and internal tooling.
Matching Summary
NVIDIA is seeking a motivated AI Acceleration & Optimization Engineer to improve the performance, scalability, and efficiency of modern AI models across NVIDIA GPU platforms.
Skills & Requirements
Must-have
AI model optimization on GPUs
CUDA and parallel computing
Deep learning framework experience
Performance profiling and analysis
Python programming skills
Nice-to-have
Experience with TensorRT and vLLM
Knowledge of quantization and sparsity
Distributed training or inference experience
Open-source ML systems contributions
C++ proficiency and debugging skills
Key Requirements
Bachelor’s or Master’s degree in relevant field or equivalent experience
2–4 years experience in deep learning or high-performance computing
Good understanding of deep learning fundamentals and transformer architectures
Familiarity with GPU architecture and parallel computing concepts