Multimodal Ai Systems Architect (ai Engineering)

Hyphen Connect

Australia, Australia
On-site
Integrate vision encoders and audio-native models
Optimize streaming latency for voice interactions
Architect multimodal rag systems
Hyphen Connect is looking for a Multimodal AI Systems Architect to enhance AI systems that integrate vision and audio models, focusing on voice interactions and multimodal retrieval. The ideal candidate will have experience with specific AI technologies and architectures to optimize system performance

Job Summary

  • The role focuses on developing AI systems that seamlessly integrate vision and audio models to enhance voice-to-voice interactions.
  • Candidates will be responsible for optimizing streaming latency and architecting multimodal RAG systems capable of retrieving insights from videos and PDFs.
  • This position requires deep expertise in cross-modal alignment and the integration of specific tools like Whisper and CLIP into core agent reasoning loops.

Matching Summary

Match Score: 85

Hyphen Connect is looking for a Multimodal AI Systems Architect to enhance AI systems that integrate vision and audio models, focusing on voice interactions and multimodal retrieval. The ideal candidate will have experience with specific AI technologies and architectures to optimize system performance.

Skills & Requirements

Must-have

  • Integrate vision encoders and audio-native models
  • Optimize streaming latency for voice interactions
  • Architect multimodal RAG systems
  • Experience with Whisper and CLIP models
  • Knowledge of WebRTC streaming architectures

Nice-to-have

  • Expertise in cross-modal alignment techniques
  • Innovative system design capabilities
  • Efficient multimodal LLM integration skills

Key Requirements

  • Experience with multimodal LLM integration
  • Knowledge of streaming architectures and WebRTC
  • Expertise in cross-modal alignment

Work Rights

Not specified

Tailored Resume

Cover Letter