Hyphen Connect is looking for a Synthetic Data Engineer to create and manage synthetic data generation pipelines for training models. The ideal candidate should have extensive experience in building large-scale data pipelines and a strong understanding of prompt engineering and bias mitigation
Job Summary
The role focuses on designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
Candidates will implement automated systems for quality scoring and de-duplication to ensure high-quality data management.
This position is critical for managing data pipelines that feed directly into Supervised Fine-Tuning and Direct Preference Optimization training loops.
Matching Summary
Match Score: 85
Hyphen Connect is looking for a Synthetic Data Engineer to create and manage synthetic data generation pipelines for training models. The ideal candidate should have extensive experience in building large-scale data pipelines and a strong understanding of prompt engineering and bias mitigation.
Skills & Requirements
Must-have
design domain-specific synthetic data pipelines
implement automated quality scoring systems
manage data pipelines for SFT and DPO training
Nice-to-have
experience with dataset distillation techniques
knowledge of bias mitigation strategies
proficiency in self-instruct prompting methods
Key Requirements
Proven experience building large-scale data pipelines