Machine Learning Performance Engineer

Jane Street

London, United Kingdom
On-site
Low-level systems programming experience
Cuda ptx sass warps cooperative groups knowledge
Distributed gpu training collective algorithms understanding
This role focuses on optimizing the performance of machine learning models for both training and real-time inference within a rapid-feedback trading environment

Job Summary

  • This role focuses on optimizing the performance of machine learning models for both training and real-time inference within a rapid-feedback trading environment.
  • Candidates must possess deep low-level GPU knowledge including PTX, SASS, warps, and memory hierarchy to ensure efficient large-scale operations.
  • The team values an inventive approach and encourages asking hard questions about whether the right tools and methodologies are being used.

Matching Summary

This role focuses on optimizing the performance of machine learning models for both training and real-time inference within a rapid-feedback trading environment.

Skills & Requirements

Must-have

  • Low-level systems programming experience
  • CUDA PTX SASS warps cooperative groups knowledge
  • Distributed GPU training collective algorithms understanding
  • Infiniband RoCE GPUDirect NVLink networking technologies
  • Triton CUTLASS CUB Thrust cuDNN cuBLAS library knowledge

Nice-to-have

  • Curious mind with passion for solving problems
  • Intuitive approach to latency and throughput characteristics
  • Willingness to question current approaches and tools
  • Experience debugging training runs end to end

Key Requirements

  • Fluency in English required
  • Experience with CUDA GDB and NSight tools
  • Background in NCCL or MPI collective algorithms

Work Rights

Not specified

Tailored Resume

Cover Letter