Performance Engineer

Cerebras Systems

Toronto, Canada
On-site
Cpu and memory subsystem optimizations
X86 architecture optimization
Avx instructions and prefetch mechanisms
Focus on CPU and memory subsystem optimizations for our Runtime software driver, enabling faster key cloud and ML training/inference workloads across modern x86 machines that form the backbone of our AI accelerator

Job Summary

  • Focus on CPU and memory subsystem optimizations for our Runtime software driver, enabling faster key cloud and ML training/inference workloads across modern x86 machines that form the backbone of our AI accelerator.
  • Optimize our workloads using advanced CPU features like AVX instructions, prefetch mechanisms, and cache optimization techniques.
  • Engage directly with the AI and ML developer community to understand their needs and solve contemporary challenges with innovative solutions.

Matching Summary

Focus on CPU and memory subsystem optimizations for our Runtime software driver, enabling faster key cloud and ML training/inference workloads across modern x86 machines that form the backbone of our AI accelerator.

Skills & Requirements

Must-have

  • CPU and memory subsystem optimizations
  • x86 architecture optimization
  • AVX instructions and prefetch mechanisms
  • performance profiling and characterization
  • C/C++ and Python proficiency

Nice-to-have

  • distributed systems experience
  • compiler technologies familiarity
  • PyTorch and ML frameworks

Key Requirements

  • 5+ years of relevant experience
  • BS, MS, or PhD in Computer Science or related field

Work Rights

Not specified

Tailored Resume

Cover Letter