Data Engineer - Foundational

Harmattan AI

Paris, France
On-site
5-6+ years experience building terabyte-scale pipelines
Expertise in apache spark, ray, or apache beam
Proficiency with webdataset, tfrecords, or parquet formats
Harmattan AI is a next-generation defense prime building autonomous systems following a $200M Series B funding round

Job Summary

  • Harmattan AI is a next-generation defense prime building autonomous systems following a $200M Series B funding round.
  • The role involves managing terabytes of raw video data to ensure ML engineers can focus on architecture rather than data wrangling.
  • Candidates must architect storage-to-GPU pipelines that maintain over 90% GPU utilization without I/O bottlenecks.

Matching Summary

Harmattan AI is a next-generation defense prime building autonomous systems following a $200M Series B funding round.

Skills & Requirements

Must-have

  • 5-6+ years experience building terabyte-scale pipelines
  • Expertise in Apache Spark, Ray, or Apache Beam
  • Proficiency with WebDataset, TFRecords, or Parquet formats
  • Experience optimizing multi-node GPU data loading
  • Knowledge of sensor synchronization algorithms

Nice-to-have

  • Strong command of distributed computing tools
  • Familiarity with ML data versioning tools like DVC
  • Systems-thinking mindset in fast-paced environments
  • Experience with MinIO or S3 tiering solutions

Key Requirements

  • BS or MS in Computer Science, Software Engineering, or Distributed Systems
  • Deep knowledge of operating systems, networking, and parallel computing
  • Proven track record of maximizing multi-node GPU utilization

Work Rights

Not specified

Tailored Resume

Cover Letter