Llm Inference Frameworks And Optimization Engineer

Togetherai

San Francisco, United States
Base: $160,000 - $230,000; equity: equity; benefit...
On-site
Distributed inference engines
Gpu/accelerator optimizations
Cuda graph optimizations
Together.ai is building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs)

Job Summary

  • Together.ai is building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs).
  • Responsibilities include designing and developing fault-tolerant, high-concurrency distributed inference engines and implementing optimized distributed inference strategies.
  • The company offers competitive compensation, startup equity, health insurance, and other competitive benefits.

Matching Summary

Together.ai is building state-of-the-art infrastructure to enable efficient and scalable inference for large language models (LLMs).

Salary

Base: $160,000 - $230,000; Equity: equity; Benefits: health insurance and other competitive benefits

Skills & Requirements

Must-have

  • distributed inference engines
  • GPU/accelerator optimizations
  • CUDA graph optimizations
  • TensorRT-LLM graph optimizations
  • KV cache systems
  • Python and C++/CUDA proficiency

Nice-to-have

  • large-scale data center networks
  • distributed filesystem experience
  • Kubernetes (K8S) familiarity
  • open-source deep learning contributions

Key Requirements

  • 3+ years experience
  • LLM inference frameworks experience
  • GPU programming or compiler background
  • Transformer architectures understanding

Work Rights

Not specified

Tailored Resume

Cover Letter