Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines for training loops. The ideal candidate should possess experience in building large-scale data pipelines and have a strong background in prompt engineering
Job Summary
The role involves designing domain-specific synthetic data generation pipelines to ensure high-quality data management for training loops.
Candidates will implement automated quality scoring and de-duplication systems to support data processing efficiency.
The position requires managing data pipelines that feed directly into Supervised Fine-Tuning and Direct Preference Optimization training loops.
Matching Summary
Match Score: 85
Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines for training loops. The ideal candidate should possess experience in building large-scale data pipelines and have a strong background in prompt engineering.
Skills & Requirements
Must-have
design domain-specific synthetic data pipelines
implement automated quality scoring systems
manage data pipelines for SFT and DPO training
build large-scale data pipelines using Airflow Spark Ray
Nice-to-have
deep knowledge of prompt engineering techniques
familiarity with dataset distillation methods
experience with bias mitigation strategies
Key Requirements
Proven experience building large-scale data pipelines
Deep knowledge of prompt engineering for data generation
Familiarity with dataset distillation and bias mitigation