Staff Inference Ml Runtime Engineer

Cerebras Systems

Sunnyvale, CA, US
On-site
Design and implement ml features
High-throughput, low-latency multimodal inference
Scalable serving backend
Cerebras Systems is seeking a Staff Inference ML Runtime Engineer to join their team focused on enabling high-performance generative AI inference solutions. The role involves designing and implementing APIs and ML features on the company’s custom hardware, requiring extensive experience in software engineering and machine learning

Job Summary

  • The Inference ML Engineering team is dedicated to enabling our fast generative inference solution through simple APIs powered by a distributed runtime.
  • As a Senior Software Engineer, you will play a key role in designing and implementing APIs, ML features, and tools that enable running state-of-the-art generative AI models on our custom hardware.
  • Cerebras builds the world's largest AI chip, 56 times larger than GPUs, offering industry-leading training and inference speeds.

Matching Summary

Match Score: 85

Cerebras Systems is seeking a Staff Inference ML Runtime Engineer to join their team focused on enabling high-performance generative AI inference solutions. The role involves designing and implementing APIs and ML features on the company’s custom hardware, requiring extensive experience in software engineering and machine learning.

Skills & Requirements

Must-have

  • design and implement ML features
  • high-throughput, low-latency multimodal inference
  • scalable serving backend
  • optimize software for LLM inference
  • Python for scalable systems
  • C++ for performance optimization

Nice-to-have

  • state-of-the-art generative AI models
  • cross-functional initiative leadership
  • agile team environment
  • continuous learning and growth

Key Requirements

  • 8+ years of experience in large-scale software engineering
  • Bachelor’s, Master’s, or PhD in Computer Science or related field
  • Experience building and scaling large-scale inference systems for LLMs
  • Familiarity with LLM serving frameworks (vLLM, SGLang, TensorRT-LLM)
  • Hands-on experience with ML frameworks (PyTorch)

Work Rights

Not specified

Tailored Resume

Cover Letter