Multimodal Ai Systems Architect (ai Engineering)

Hyphen Partners

Oregon, United States
On-site
Integrate vision encoders and audio-native models
Optimize streaming latency for voice interactions
Architect multimodal rag systems for video and pdfs
Hyphen Partners is looking for a Multimodal AI Systems Architect to develop and enhance AI systems that integrate vision and audio models for improved voice interactions. The role focuses on optimizing streaming latency and architecting multimodal retrieval systems

Job Summary

  • The role focuses on developing AI systems that seamlessly integrate vision and audio models to enhance voice-to-voice interactions.
  • Candidates will be responsible for optimizing streaming latency and architecting multimodal RAG systems capable of retrieving insights from videos and PDFs.
  • This position requires deep expertise in cross-modal alignment and experience with specific tools like Whisper and CLIP.

Matching Summary

Match Score: 85

Hyphen Partners is looking for a Multimodal AI Systems Architect to develop and enhance AI systems that integrate vision and audio models for improved voice interactions. The role focuses on optimizing streaming latency and architecting multimodal retrieval systems.

Skills & Requirements

Must-have

  • Integrate vision encoders and audio-native models
  • Optimize streaming latency for voice interactions
  • Architect multimodal RAG systems for video and PDFs
  • Experience with Whisper, CLIP, and multimodal LLMs
  • Knowledge of streaming architectures and WebRTC
  • Expertise in cross-modal alignment

Nice-to-have

  • Efficient and innovative system design
  • Seamless integration of core agent reasoning loops

Key Requirements

  • Experience with Whisper, CLIP, and multimodal LLM integration
  • Knowledge of streaming architectures and WebRTC
  • Expertise in cross-modal alignment

Work Rights

Not specified

Tailored Resume

Cover Letter