Senior Ai Infrastructure Engineer

Gatik AI

Mountain View, CA, United States
On-site
Pytorch distributed and ray train
Multi-gpu cluster optimization h100/a100
Kubernetes-native gpu scheduling
Gatik is revolutionizing B2B supply chain logistics with its proprietary Level 4 autonomous technology for middle-mile transportation

Job Summary

  • Gatik is revolutionizing B2B supply chain logistics with its proprietary Level 4 autonomous technology for middle-mile transportation.
  • This role bridges the gap between research and production by designing scalable infrastructure for distributed training and seamless model deployment.
  • The position requires onsite presence 5 days a week at the Mountain View, CA office to support high-performance AI platform development.

Matching Summary

Gatik is revolutionizing B2B supply chain logistics with its proprietary Level 4 autonomous technology for middle-mile transportation.

Skills & Requirements

Must-have

  • PyTorch Distributed and Ray Train
  • Multi-GPU cluster optimization H100/A100
  • Kubernetes-native GPU scheduling
  • TensorRT ONNX Runtime Triton Inference Server
  • LangGraph CrewAI AutoGen agent frameworks
  • MLFlow Argo Workflows MLOps lifecycle
  • Prometheus Grafana OpenTelemetry monitoring

Nice-to-have

  • Experience with 3D Gaussian Splatting
  • InfiniBand or RoCE v2 networking tuning
  • Apache Airflow Kafka Spark pipelines
  • Terraform Helm Infrastructure as Code
  • Real-time and batch processing optimization

Key Requirements

  • Senior level experience in AI infrastructure
  • Onsite work requirement in Mountain View CA
  • Expertise in PyTorch Distributed and Kubernetes

Work Rights

Not specified

Tailored Resume

Cover Letter