Ai Specialist (ai Engineering)

Hyphen Partners

Boston, United States
On-site
Model distillation and pruning techniques
4-bit/8-bit quantization expertise
Tensorrt and onnx runtime experience
The role focuses on enhancing the performance of large language and vision models specifically for on-device inference

Job Summary

  • The role focuses on enhancing the performance of large language and vision models specifically for on-device inference.
  • Candidates will develop pipelines for model distillation and handle hardware-specific compilation tasks.
  • Performance benchmarking across various NPU and GPU architectures is a core responsibility of this position.

Matching Summary

The role focuses on enhancing the performance of large language and vision models specifically for on-device inference.

Skills & Requirements

Must-have

  • Model distillation and pruning techniques
  • 4-bit/8-bit quantization expertise
  • TensorRT and ONNX Runtime experience
  • C++ and Python programming skills
  • NPU/GPU architecture benchmarking

Nice-to-have

  • Edge deployment optimization experience
  • Diverse hardware architecture knowledge
  • Cutting-edge AI solution development

Key Requirements

  • Expertise in model compression and quantization
  • Hands-on experience with TensorRT and ONNX Runtime
  • Strong proficiency in C++ and Python

Work Rights

Not specified

Tailored Resume

Cover Letter