Architect high-performance inference runtimes, kernel dispatchers, and memory planners for large diffusion and transformer workloads.
Must-have
Nice-to-have
Not specified