Design and implement scalable infrastructure for training and deploying deep learning models on top of a real-time robotic control stack
Job Summary
Design and implement scalable infrastructure for training and deploying deep learning models on top of a real-time robotic control stack.
Build data pipelines that support distributed computing to process large volumes of robotics data for model training.
Optimize the allocation of compute resources, such as GPUs and TPUs, to reduce cost and latency during model development and create orchestration workflows to successfully run jobs on GKE.
Matching Summary
Design and implement scalable infrastructure for training and deploying deep learning models on top of a real-time robotic control stack.
Skills & Requirements
Must-have
MLOps and machine learning infrastructure
Python and C++ programming
Docker and Kubernetes
TensorFlow, JAX, or PyTorch
Google Cloud Platform experience
Nice-to-have
Image processing workflow knowledge
Kubeflow toolkits
CUDA optimization
Production ML model deployment
Robotics systems familiarity
Key Requirements
2 years of experience in software development
Bachelor's degree in Computer Science, Robotics, or Machine Learning