Senior Software Engineer, Ai Inference

Nvidia Corporation

Toronto, Canada
Base: 135,000 cad - 220,000 cad; bonus/equity: not...
On-site
Llm serving architectures
Vllm deployment and tuning
Kubernetes and slurm environments
Work directly with customer engineering teams to understand their LLM serving architectures and performance goals, then design and implement end-to-end benchmarking campaigns

Job Summary

  • Work directly with customer engineering teams to understand their LLM serving architectures and performance goals, then design and implement end-to-end benchmarking campaigns.
  • Set up and operate vLLM serving deployments on GPU clusters, tuning configurations for throughput, latency, and efficiency, and collect profiling traces to identify performance gaps.
  • Build internal tools, benchmarking harnesses, and automation pipelines that raise the productivity of your teammates and customers alike.

Matching Summary

Work directly with customer engineering teams to understand their LLM serving architectures and performance goals, then design and implement end-to-end benchmarking campaigns.

Salary

Base: 135,000 CAD - 220,000 CAD; Bonus/Equity: Not specified; Benefits: Not specified

Skills & Requirements

Must-have

  • LLM serving architectures
  • vLLM deployment and tuning
  • Kubernetes and Slurm environments
  • GPU performance analysis
  • Nsight Systems/Compute profiling
  • open-source contributions

Nice-to-have

  • customer-facing engineering
  • developer tools building
  • ML compilers
  • GPU kernel development

Key Requirements

  • 5+ years of industry experience
  • Bachelor's, Master's, or PhD in CS/CE or equivalent
  • Experience deploying and operating LLM inference workloads
  • Proficiency with Kubernetes and Slurm
  • Understanding of LLM serving fundamentals
  • Familiarity with GPU performance analysis

Work Rights

Not specified

Tailored Resume

Cover Letter