Develop end-to-end pipelines that transform images and video into structured, reliable observations by combining modern vision models with multimodal reasoning and contextual signals
Job Summary
Develop end-to-end pipelines that transform images and video into structured, reliable observations by combining modern vision models with multimodal reasoning and contextual signals.
Design, build, and improve multi-stage computer vision pipelines that may include segmentation, detection, tracking, and VLM-based analysis, producing structured outputs.
Build scalable cloud workflows for batch processing and integrate outputs with APIs and downstream consumers, improving operational performance and cost.
Matching Summary
Develop end-to-end pipelines that transform images and video into structured, reliable observations by combining modern vision models with multimodal reasoning and contextual signals.
Skills & Requirements
Must-have
Computer Vision
Deep Learning for Computer Vision
Python
PyTorch
ML prototypes into reliable pipelines
Cloud or backend workflows
Nice-to-have
Vision-Language Models (VLMs)
Multimodal fusion
Video pipelines
Real-world datasets
Reusable platform components
Key Requirements
Bachelor’s degree in CS, EE, Robotics, or related field (or equivalent practical experience)
4+ years of experience building computer vision systems
Strong experience with deep learning for computer vision
Experience taking ML prototypes into reliable pipelines
Experience building or integrating ML systems into cloud or backend workflows