Design, develop, and deploy scalable, high-performance, and production-grade backend services and distributed systems to support large-scale model inference
Job Summary
Design, develop, and deploy scalable, high-performance, and production-grade backend services and distributed systems to support large-scale model inference.
Ensure the reliability, scalability, and efficiency of our systems in production using monitoring and observability tools like Prometheus and Grafana.
Manage and optimize our cloud infrastructure (GCP) and orchestrate workloads with Kubernetes.
Matching Summary
Design, develop, and deploy scalable, high-performance, and production-grade backend services and distributed systems to support large-scale model inference.
Skills & Requirements
Must-have
Distributed systems at scale
High-performance backend services
Low-latency, high-throughput services
Cloud infrastructure management (GCP)
Kubernetes workload orchestration
Monitoring and observability tools (Prometheus, Grafana)