Lead the development and optimization of data infrastructure supporting Agentic AI initiatives
Job Summary
Lead the development and optimization of data infrastructure supporting Agentic AI initiatives.
Collaborate with ML engineers, AI scientists, and product managers to architect, implement, and maintain robust data pipelines powering autonomous AI agents.
Drive data platform reliability, scalability, and cost optimization across cloud-based infrastructure.
Matching Summary
Lead the development and optimization of data infrastructure supporting Agentic AI initiatives.
Skills & Requirements
Must-have
scalable data pipelines
ETL processes
data models
Python and Scala
SQL and NoSQL databases
vector databases
Airflow 2.x
cloud platform (AWS, Azure, or GCP)
Docker, Kubernetes
Terraform
Spark, Dask, Ray
Kafka, Flink
Nice-to-have
streaming and event-driven pipelines
real-time agent feedback
automated data validation
ML frameworks
feature stores
LLM fine-tuning data requirements
autonomous AI agents
prompt engineering
retrieval-augmented generation
semantic caching
LLM evaluation metrics
RAG systems
RLHF data workflows
mentoring junior engineers
Key Requirements
5+ years of professional experience in data engineering
2 years focused on ML/AI data infrastructure
Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field
Advanced proficiency in Python and Scala
Expert-level knowledge of SQL and NoSQL databases
Hands-on experience with vector databases
Proficiency with modern data orchestration platforms
Extensive experience with at least one major cloud platform