Machine Learning Engineer - Ml Training Platform

PLURALIS RESEARCH

San Francisco, CA, United States
Remote
Infrastructure-as-code multi-cloud deployments
Distributed ml training systems
Gpu workload orchestration
Pluralis Research is pioneering Protocol Learning, a decentralized way to train and deploy AI models that democratizes access beyond large corporations

Job Summary

  • Pluralis Research is pioneering Protocol Learning, a decentralized way to train and deploy AI models that democratizes access beyond large corporations.
  • The role involves owning core systems including infrastructure orchestration, distributed compute, and services integration to enable large-scale model training.
  • The company is backed by tier-1 investors and emphasizes a deeply technical team committed to preventing monopolization of AI model development.

Matching Summary

Pluralis Research is pioneering Protocol Learning, a decentralized way to train and deploy AI models that democratizes access beyond large corporations.

Skills & Requirements

Must-have

  • Infrastructure-as-code multi-cloud deployments
  • Distributed ML training systems
  • GPU workload orchestration
  • Python engineering with concurrency
  • Docker/Kubernetes (EKS) management
  • Fault-tolerant distributed infrastructure

Nice-to-have

  • Startup environment experience
  • Micro-services orchestration
  • High attention to detail
  • Team player
  • Passion for decentralized AI

Key Requirements

  • 5+ years work experience
  • Experience with Pulumi/Terraform/CloudFormation
  • Deep understanding of distributed training workflows
  • Strong Python skills with asyncio and concurrency
  • Experience with Prometheus/Grafana monitoring
  • Experience managing GPU clusters
  • Experience with decentralized networking

Work Rights

Not specified

Tailored Resume

Cover Letter