Senior Performance Engineer

Invidia

Israel
Performance analysis of ai and hpc workloads
High-performance networking expertise
Cross-functional r&d collaboration
You will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale GPU- and CPU-based clusters used in AI and high-performance computing environments

Job Summary

  • You will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale GPU- and CPU-based clusters used in AI and high-performance computing environments.
  • This is a fast-paced R&D environment where system behavior and requirements evolve rapidly, requiring adaptable engineering solutions and strong analytical thinking.
  • You will work closely with hardware, networking, firmware, and software teams to collect, analyze, and interpret performance data from live systems.

Matching Summary

You will help build and evolve systems that support performance analysis, telemetry, and optimization for large-scale GPU- and CPU-based clusters used in AI and high-performance computing environments.

Skills & Requirements

Must-have

  • Performance analysis of AI and HPC workloads
  • High-performance networking expertise
  • Cross-functional R&D collaboration
  • Telemetry collection and data refinement
  • Performance benchmarking and diagnostic tools

Nice-to-have

  • Knowledge of CUDA and NCCL internals
  • Experience with deep learning frameworks
  • Proficiency in Python and Linux environments
  • Experience with cloud platforms
  • Strong analytical and communication skills

Key Requirements

  • B.Sc. or M.Sc. in Computer Science or related field
  • 5+ years experience in performance analysis or HPC/AI infrastructure
  • Hands-on experience with RDMA, MPI, NCCL
  • Strong understanding of system performance metrics
  • Ability to work in fast-paced cross-functional teams

Work Rights

Not specified

Tailored Resume

Cover Letter