Senior Networking Solution Test Engineer, Ai Cluster Debugging

Invidia

Israel
Hybrid
System-level debugging on linux
Ethernet/nic/dpu/switch testing
Ai cluster troubleshooting
You will work on cutting-edge Ethernet-based AI clusters, owning complex issues across hardware, system software and AI workloads

Job Summary

  • You will work on cutting-edge Ethernet-based AI clusters, owning complex issues across hardware, system software and AI workloads.
  • Design and review test and product requirements across the Ethernet / NIC / DPU / Switch portfolio, focusing on large-scale AI cluster behavior.
  • NVIDIA is widely considered to be one of the technology world’s most desirable employers with a diverse and inclusive work environment.

Matching Summary

You will work on cutting-edge Ethernet-based AI clusters, owning complex issues across hardware, system software and AI workloads.

Skills & Requirements

Must-have

  • system-level debugging on Linux
  • Ethernet/NIC/DPU/Switch testing
  • AI cluster troubleshooting
  • networking protocol debugging
  • host-side NIC validation and tuning
  • scripting with Bash/Python/Ansible
  • performance and scale testing

Nice-to-have

  • debugging collective communication libraries
  • large-scale LLM training cluster experience
  • congestion control and lossless Ethernet tuning
  • familiarity with NVIDIA networking technologies
  • multi-layer network issue debugging
  • collaborative mindset
  • fast learner with AI tools

Key Requirements

  • B.A./B.Sc. in Computer Science or Electrical Engineering
  • 5+ years networking or system-level testing experience
  • strong Linux networking and debugging skills
  • proven production-grade debugging experience
  • expertise in AI networking libraries and protocols
  • ability to read and reason about source code
  • solid scripting and automation skills

Work Rights

Not specified

Tailored Resume

Cover Letter