NVIDIA is seeking a sharp, innovative, and hands-on Architect to help shape the future of LLM inference at scale
Job Summary
NVIDIA is seeking a sharp, innovative, and hands-on Architect to help shape the future of LLM inference at scale.
You will work across software and hardware domains to design and optimize inference infrastructure for large language models running on some of the most advanced GPU clusters in the world.
This is an opportunity to work with top engineers, researchers, and partners across NVIDIA and leave a mark on the way generative AI reaches real-world applications.
Matching Summary
NVIDIA is seeking a sharp, innovative, and hands-on Architect to help shape the future of LLM inference at scale.
Skills & Requirements
Must-have
Large-scale distributed systems
GPU acceleration and deep learning
C++ and/or Python programming
Inference infrastructure optimization
Multi-node LLM inference architecture
System-level memory and networking orchestration
Nice-to-have
Transformer model optimization
Model-parallel deployments
Profiling and performance bottleneck optimization
AI accelerators and distributed communication
Congestion control and load balancing
Passion for solving tough technical problems
Key Requirements
Bachelor’s, Master’s, or PhD in Computer Science or Electrical Engineering
8+ years experience in performance-critical software
Experience building large-scale distributed systems