Senior Solution Architect, Applied Ai

WEKA

Remote
Remote
Llm inference pipelines
Vllm, llm-d, nixl integration
Kv cache reuse, speculative decoding
WEKA is seeking a Senior Solution Architect for Applied AI, focusing on high-performance systems for LLM inference. The ideal candidate will lead a small team in optimizing AI infrastructure while embodying the company's values of accountability, bravery, collaboration, and customer-centricity

Job Summary

  • Architect and oversee the deployment of high-throughput, low-latency LLM inference pipelines.
  • Implement and evaluate state-of-the-art KV cache management solutions, including LMCache, and explore alternatives to minimize redundant computation.
  • Stay at the forefront of the "Inference-as-a-Service" domain, benchmarking new tools and deciding when to pivot the stack.

Matching Summary

Match Score: 85

WEKA is seeking a Senior Solution Architect for Applied AI, focusing on high-performance systems for LLM inference. The ideal candidate will lead a small team in optimizing AI infrastructure while embodying the company's values of accountability, bravery, collaboration, and customer-centricity.

Skills & Requirements

Must-have

  • LLM inference pipelines
  • vLLM, LLM-d, NIXL integration
  • KV cache reuse, speculative decoding
  • Python, C++, or Rust expertise
  • CUDA and GPU memory management
  • Kubernetes for scaling GPU workloads

Nice-to-have

  • Agentic AI data infrastructure
  • Cloud and AI-native software
  • Sustainable energy consumption
  • Customer-centric approach
  • Collaborative and brave culture

Key Requirements

  • AI Inference Domain experience
  • Specific Stack familiarity (vLLM, LMCache, NIXL)
  • Backend Engineering expertise
  • Infrastructure experience with Kubernetes

Work Rights

USA Residents Only

Tailored Resume

Cover Letter