Machine Learning Performance Engineer

Jane Street

New York, United States
On-site
Low-level systems programming experience
Cuda ptx sass warps cooperative groups knowledge
Distributed gpu training collective algorithms understanding
This role focuses on optimizing the performance of machine learning models for both training and real-time inference within a rapid-feedback trading environment

Job Summary

  • This role focuses on optimizing the performance of machine learning models for both training and real-time inference within a rapid-feedback trading environment.
  • The position requires a whole-systems approach that includes storage systems, networking, and host- and GPU-level considerations to ensure efficient large-scale operations.
  • Candidates must possess deep low-level GPU knowledge including PTX, SASS, warps, and memory hierarchy to debug and optimize CUDA performance effectively.

Matching Summary

This role focuses on optimizing the performance of machine learning models for both training and real-time inference within a rapid-feedback trading environment.

Skills & Requirements

Must-have

  • Low-level systems programming experience
  • CUDA PTX SASS warps cooperative groups knowledge
  • Distributed GPU training collective algorithms understanding
  • Infiniband RoCE GPUDirect NVLink networking expertise
  • Triton CUTLASS CUB Thrust cuDNN cuBLAS library proficiency

Nice-to-have

  • Curious mind with passion for solving complex problems
  • Intuition about latency and throughput characteristics
  • Willingness to question current approaches and tools
  • Experience debugging end-to-end training run performance

Key Requirements

  • Understanding of modern ML techniques and toolsets
  • Debugging experience using CUDA GDB NSight Systems Compute
  • Background in NCCL or MPI collective algorithms

Work Rights

Not specified

Tailored Resume

Cover Letter