Senior Ml Engineer - Kimchi (llm Inference Optimization)

Castaigroupinc

Austria
Competitive salary depending on experience; equity...
Fully remote
5+ years building real ml systems
Production python services development
Experience with vllm sglang or tensorrt-llm
Cast AI is seeking a Senior ML Engineer for their Kimchi team, focusing on optimizing inference for large language models (LLMs) within cloud-native environments. The ideal candidate will have extensive experience in building ML systems, particularly in performance tuning and infrastructure optimization

Job Summary

  • This role focuses on optimizing throughput, latency, and KV cache utilization to improve customer inference speed and company margins.
  • The successful candidate will lead the technical direction of the Kimchi system, which automatically matches workloads to the most cost-efficient LLM configurations.
  • Employees enjoy a flexible remote-first environment with equity options, a learning budget, and dedicated time for personal projects.

Matching Summary

Match Score: 85

Cast AI is seeking a Senior ML Engineer for their Kimchi team, focusing on optimizing inference for large language models (LLMs) within cloud-native environments. The ideal candidate will have extensive experience in building ML systems, particularly in performance tuning and infrastructure optimization.

Salary

Competitive salary depending on experience; Equity options included; Flexible remote-first environment

Skills & Requirements

Must-have

  • 5+ years building real ML systems
  • Production Python services development
  • Experience with vLLM SGLang or TensorRT-LLM
  • Fluency with quantization tradeoffs and quality regression
  • Comfort with distributed systems and multi-GPU setups
  • Bias toward measurement and instrumentation before optimization

Nice-to-have

  • Self-direction and high autonomy in technical leadership
  • Experience with continuous batching and speculative decoding
  • Knowledge of KV cache optimization strategies
  • Familiarity with GCP AWS Azure cloud environments
  • Ability to set technical direction and benchmark standards

Key Requirements

  • 5+ years experience in ML infrastructure
  • Strong production Python skills
  • Hands-on inference engine experience
  • Distributed systems fluency
  • No visa sponsorship provided

Work Rights

Not specified

Tailored Resume

Cover Letter