Ai Inference Engineer

F5

Python c++ rust golang proficiency
Vllm tgi nvidia triton experience
Kubernetes docker infrastructure expertise
This role bridges the gap between high-performance model development and optimized deployment environments for Large Language Models

Job Summary

  • This role bridges the gap between high-performance model development and optimized deployment environments for Large Language Models.
  • The engineer will build robust inference engines using tools like vLLM, TGI, and NVIDIA Triton to ensure high performance at scale.
  • Success is measured by reducing latency, lowering cost per token, and maintaining system stability during traffic spikes.

Matching Summary

This role bridges the gap between high-performance model development and optimized deployment environments for Large Language Models.

Skills & Requirements

Must-have

  • Python C++ Rust Golang proficiency
  • vLLM TGI NVIDIA Triton experience
  • Kubernetes Docker infrastructure expertise
  • NVIDIA GPU CUDA TensorRT optimization
  • High-throughput low-latency serving

Nice-to-have

  • Speculative Decoding or PagedAttention knowledge
  • Open-source inference library contributions
  • CUDA or Triton kernel development
  • MLOps or SRE background in AI
  • Apple Silicon CoreML optimization

Key Requirements

  • Proficiency in Python, C++, Rust, or Golang
  • Hands-on experience with vLLM, TensorRT, Llama.cpp, Ollama
  • Strong familiarity with AWS, GCP, Azure cloud platforms

Work Rights

Not specified

Tailored Resume

Cover Letter