Ai Specialist (ai Engineering)

Hyphen Partners

Seattle, United States
On-site
Model distillation and pruning techniques
4-bit/8-bit quantization expertise
Tensorrt and onnx runtime experience
The role focuses on enhancing the performance of large language and vision models specifically for on-device inference

Job Summary

  • The role focuses on enhancing the performance of large language and vision models specifically for on-device inference.
  • Candidates will be responsible for developing pipelines for model distillation and hardware-specific compilation to ensure optimal efficiency.
  • The position requires benchmarking performance across various NPU and GPU architectures to validate AI solutions.

Matching Summary

The role focuses on enhancing the performance of large language and vision models specifically for on-device inference.

Skills & Requirements

Must-have

  • Model distillation and pruning techniques
  • 4-bit/8-bit quantization expertise
  • TensorRT and ONNX Runtime experience
  • C++ and Python programming skills
  • NPU/GPU architecture benchmarking

Nice-to-have

  • Edge deployment experience
  • Diverse hardware architecture knowledge
  • Cutting-edge AI solution development

Key Requirements

  • Expertise in model quantization and pruning
  • Hands-on experience with TensorRT
  • Strong C++ and Python coding skills

Work Rights

Not specified

Tailored Resume

Cover Letter