Principal Software Engineer - Inference As A Service

NVIDIA

Santa Clara, CA, US
Base: 248,000 usd - 391,000 usd; bonus/equity: equ...
Not specified (suggesting a potential hybrid or flexible arrangement based on the tech industry trends).
Distributed systems
Large-scale backend infrastructure
Gpu resource management
NVIDIA is seeking a Principal Software Engineer to join their Software Infrastructure Team in Santa Clara, CA, focusing on developing their Inference as a Service platform. The ideal candidate will have extensive experience in software engineering, particularly in distributed systems and backend infrastructure, and will work on optimizing GPU resource management for AI models

Job Summary

  • Lead the design and development of a scalable, robust, and reliable platform for serving AI models for inference as a service.
  • Architect and implement systems for dynamic GPU resource management, autoscaling, and efficient scheduling of inference workloads.
  • Optimize system performance and latency for various model types, from large language models (LLMs) to computer vision models, ensuring high-throughput and responsiveness.

Matching Summary

Match Score: 85

NVIDIA is seeking a Principal Software Engineer to join their Software Infrastructure Team in Santa Clara, CA, focusing on developing their Inference as a Service platform. The ideal candidate will have extensive experience in software engineering, particularly in distributed systems and backend infrastructure, and will work on optimizing GPU resource management for AI models.

Salary

Base: 248,000 USD - 391,000 USD; Bonus/Equity: equity; Benefits: benefits

Skills & Requirements

Must-have

  • distributed systems
  • large-scale backend infrastructure
  • GPU resource management
  • high-performance, low-latency API services
  • container orchestration technologies like Kubernetes
  • modern observability tools

Nice-to-have

  • specialized inference serving frameworks
  • open-source contributions
  • performance optimization techniques for AI models
  • full lifecycle of an AI model

Key Requirements

  • 15+ years of software engineering experience
  • BS, MS, or PhD in Computer Science or related fields (or equivalent experience)
  • Strong programming skills in Python, Go, or C++
  • Proven experience with container orchestration technologies like Kubernetes
  • Experience in designing, implementing, and optimizing systems for GPU resource management

Work Rights

Not specified

Tailored Resume

Cover Letter