Staff Technical Lead For Inference & Ml Performance

fal

San Francisco, United States
On-site
Ml performance optimization
Large-scale generative models
Pytorch, tensorrt, triton
You’ll shape the future of fal’s inference engine and ensure our generative models achieve best-in-class performance

Job Summary

  • You’ll shape the future of fal’s inference engine and ensure our generative models achieve best-in-class performance.
  • Your work directly impacts our ability to rapidly deliver cutting-edge creative solutions to users, from individual creators to global brands.
  • One of the highest impact roles at one of the fastest growing companies with a world changing vision: hyperscaling human creativity.

Matching Summary

You’ll shape the future of fal’s inference engine and ensure our generative models achieve best-in-class performance.

Skills & Requirements

Must-have

  • ML performance optimization
  • large-scale generative models
  • PyTorch, TensorRT, Triton
  • kernel authoring
  • model parallelism
  • distributed serving
  • hands-on IC leadership

Nice-to-have

  • inference engines for generative media
  • industry-leading performance improvements
  • scaling technical teams

Key Requirements

  • Deep experience in ML performance optimization
  • Expert-level familiarity with advanced inference techniques
  • Experience with PyTorch, TensorRT, TransformerEngine, Triton
  • Experience with CUTLASS kernels

Work Rights

Not specified

Tailored Resume

Cover Letter