Architect and implement high-impact compiler subsystems including IR design, lowering pipelines, operator fusion, scheduling, and target code generation for our NPU
Job Summary
Architect and implement high-impact compiler subsystems including IR design, lowering pipelines, operator fusion, scheduling, and target code generation for our NPU.
Drive low-level optimizations and collaborate with kernel teams to close the HW/SW performance gap, including micro-kernel design and latency/throughput tradeoffs.
Mentor senior/junior engineers, do design reviews, and raise the team’s technical bar.
Matching Summary
Architect and implement high-impact compiler subsystems including IR design, lowering pipelines, operator fusion, scheduling, and target code generation for our NPU.
Skills & Requirements
Must-have
ML compilers and toolchains
graph lowering, IR design
operator fusion, scheduling
target code generation for accelerators
expert C++, solid Python
production compiler components
Nice-to-have
mentorship and improving engineering practices
technical engagements with customers
influence product and silicon roadmap
Key Requirements
Deep expertise in ML compilers and toolchains
Proven experience with graph lowering
Strong systems programming skills
Demonstrated track record of shipping production compiler components