Distributed Llm Inference Engineer

Anyscale

San Francisco, United States
Market-based approach; not specified; eligible for...
Remote
Familiarity with running ml inference at large scale
Solid understanding of distributed systems
Experience with deep learning frameworks like pytorch
This role is critical for achieving a market-leading position in AI infrastructure by pushing the boundaries of performance for large-scale inference

Job Summary

  • This role is critical for achieving a market-leading position in AI infrastructure by pushing the boundaries of performance for large-scale inference.
  • The engineer will iterate quickly with product teams to ship end-to-end solutions for batch and online inference used by open-source Ray users and customers.
  • Anyscale offers comprehensive benefits including healthcare premiums covered at 99% for employees and dependents, stock options, and paid parental leave.

Matching Summary

This role is critical for achieving a market-leading position in AI infrastructure by pushing the boundaries of performance for large-scale inference.

Salary

Market-based approach; Not specified; Eligible for Stock Options and Equity offerings

Skills & Requirements

Must-have

  • Familiarity with running ML inference at large scale
  • Solid understanding of distributed systems
  • Experience with deep learning frameworks like PyTorch

Nice-to-have

  • Prior experience working on GPUs or CUDA
  • Contributions to deep learning compilers like Triton
  • Experience integrating Ray Data and LLM engines

Key Requirements

  • Familiarity with high throughput and low latency ML inference
  • Knowledge of state-of-the-art research in open source communities

Work Rights

Not specified

Tailored Resume

Cover Letter