Cerebras Systems is seeking an Engineering Manager for their Inference ML Runtime team in Sunnyvale, CA, or Toronto, ON. The role requires strong technical leadership in building and scaling AI inference systems, with a focus on machine learning and distributed systems
Job Summary
Lead a team responsible for designing and scaling the systems that enable seamless execution of state-of-the-art AI models on Cerebras hardware.
Own the architecture and evolution of the ML inference runtime and serving systems, guiding the design of high-throughput, low-latency inference pipelines.
Scale Cerebras’ inference platform to handle large volumes of concurrent requests at very fast speed and drive improvements in latency, throughput, and compute efficiency.
Matching Summary
Match Score: 85
Cerebras Systems is seeking an Engineering Manager for their Inference ML Runtime team in Sunnyvale, CA, or Toronto, ON. The role requires strong technical leadership in building and scaling AI inference systems, with a focus on machine learning and distributed systems.
Skills & Requirements
Must-have
ML inference runtime and serving systems
High-throughput, low-latency inference pipelines
Multimodal model execution
Scalable serving infrastructure
Python and C++ programming
Large-scale inference systems
Nice-to-have
LLM serving frameworks
PyTorch and deep learning frameworks
Distributed systems and HPC
ML runtime systems
Performance optimization for AI workloads
Key Requirements
8+ years of experience in large-scale software engineering or ML/distributed systems
2+ years of engineering management experience
Experience building and scaling large-scale inference systems
Experience with cloud infrastructures and scalable microservices