Understanding of model inference and deployment workflows
This role focuses on optimizing and deploying large-scale multimodal models onto vehicle-grade compute platforms for autonomous driving
Job Summary
This role focuses on optimizing and deploying large-scale multimodal models onto vehicle-grade compute platforms for autonomous driving.
Candidates will support model quantization, pruning, and compression techniques under the guidance of senior engineers.
The position involves collaborating with research and platform teams to improve model deployability and analyze performance metrics like latency and memory usage.
Matching Summary
This role focuses on optimizing and deploying large-scale multimodal models onto vehicle-grade compute platforms for autonomous driving.
Skills & Requirements
Must-have
Strong C++ and Python programming skills
Familiarity with PyTorch deep learning framework
Understanding of model inference and deployment workflows
Knowledge of ONNX, TensorRT, or similar frameworks
Exposure to INT8/FP16 quantization concepts
Nice-to-have
Experience with CUDA or GPU programming
Background in Transformers or multimodal models
Interest in computer architecture and edge systems
Previous internship in embedded AI or inference acceleration
Contributions to open-source repositories
Key Requirements
BS, MS, or PhD in Computer Science, Electrical Engineering, Robotics, or related field
Strong problem-solving skills in a fast-paced engineering environment