Hyphen Connect is hiring a Multimodal AI Systems Architect to enhance voice-to-voice interactions and optimize multimodal retrieval capabilities in AI systems. The ideal candidate will possess expertise in integrating vision and audio models and have experience with technologies like Whisper and CLIP
Job Summary
The role focuses on developing AI systems that seamlessly integrate vision and audio models to enhance voice-to-voice interactions.
Candidates will be responsible for architecting multimodal RAG systems capable of retrieving insights from videos and PDFs.
This position requires expertise in optimizing streaming latency for efficient and innovative AI agent reasoning loops.
Matching Summary
Match Score: 85
Hyphen Connect is hiring a Multimodal AI Systems Architect to enhance voice-to-voice interactions and optimize multimodal retrieval capabilities in AI systems. The ideal candidate will possess expertise in integrating vision and audio models and have experience with technologies like Whisper and CLIP.