Multimodal Ai Systems Architect (ai Engineering)

Hyphen Partners

Hong Kong, Hong Kong
**
Integrate vision encoders and audio-native models
Optimize streaming latency for voice interactions
Architect multimodal rag systems
** Hyphen Partners is seeking a Multimodal AI Systems Architect to enhance voice-to-voice interactions and develop integrated AI systems that leverage vision and audio models. The ideal candidate should possess experience in multimodal integration, specifically with tools like Whisper and CLIP, as well as knowledge of streaming architectures. **

Job Summary

  • The role focuses on developing AI systems that seamlessly integrate vision and audio models to enhance voice-to-voice interactions.
  • Candidates will be responsible for architecting multimodal RAG systems capable of retrieving insights from videos and PDFs.
  • This position requires optimizing streaming latency to ensure efficient and innovative AI performance.

Matching Summary

Match Score: 75

** Hyphen Partners is seeking a Multimodal AI Systems Architect to enhance voice-to-voice interactions and develop integrated AI systems that leverage vision and audio models. The ideal candidate should possess experience in multimodal integration, specifically with tools like Whisper and CLIP, as well as knowledge of streaming architectures. **

Skills & Requirements

Must-have

  • Integrate vision encoders and audio-native models
  • Optimize streaming latency for voice interactions
  • Architect multimodal RAG systems
  • Experience with Whisper and CLIP models
  • Knowledge of WebRTC streaming architectures

Nice-to-have

  • Expertise in cross-modal alignment techniques
  • Innovative system design capabilities
  • Efficient multimodal retrieval strategies

Key Requirements

  • Experience with multimodal LLM integration
  • Knowledge of streaming architectures
  • Expertise in cross-modal alignment

Work Rights

Not specified

Tailored Resume

Cover Letter