Ai And Systems Software Intern, At Scale Ai - Fall 2026

Invidia

Hourly: 20 usd - 71 usd; bonus/equity: not specifi...
Proficiency in python scripting
Bash/shell scripting skills
Debugging complex distributed systems
The role involves investigating and triaging failures within large-scale compute clusters to distinguish between software glitches and hardware faults

Job Summary

  • The role involves investigating and triaging failures within large-scale compute clusters to distinguish between software glitches and hardware faults.
  • Candidates will analyze logs and telemetry to correlate job failures with system-level issues, helping to reduce noise and identify root causes.
  • NVIDIA offers competitive hourly rates ranging from 20 USD to 71 USD along with intern benefits for this position.

Matching Summary

The role involves investigating and triaging failures within large-scale compute clusters to distinguish between software glitches and hardware faults.

Salary

Hourly: 20 USD - 71 USD; Bonus/Equity: Not specified; Benefits: Eligible for Intern benefits

Skills & Requirements

Must-have

  • Proficiency in Python scripting
  • Bash/Shell scripting skills
  • Debugging complex distributed systems
  • Experience with cluster managers
  • High-performance computing environments

Nice-to-have

  • Familiarity with server architecture
  • Experience with monitoring tools
  • Knowledge of system profiling tools
  • Benchmark analysis on Linux
  • Strong collaboration skills

Key Requirements

  • Pursuing BS, MS, or PhD in CS or Engineering
  • Proven debugging skills in distributed systems

Work Rights

Not specified

Tailored Resume

Cover Letter