Synthetic Data Engineer (ai Data/training)

Hyphen Connect

Hong Kong, Hong Kong
On-site
Design domain-specific synthetic data pipelines
Implement automated quality scoring systems
Manage data pipelines for sft and dpo training
Hyphen Connect is looking for a Synthetic Data Engineer to create and manage synthetic data generation pipelines for training models. The ideal candidate should have extensive experience in building large-scale data pipelines and a strong understanding of prompt engineering and bias mitigation

Job Summary

  • The role focuses on designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Candidates will implement automated systems for quality scoring and de-duplication to ensure high-quality data management.
  • This position is critical for managing data pipelines that feed directly into Supervised Fine-Tuning and Direct Preference Optimization training loops.

Matching Summary

Match Score: 85

Hyphen Connect is looking for a Synthetic Data Engineer to create and manage synthetic data generation pipelines for training models. The ideal candidate should have extensive experience in building large-scale data pipelines and a strong understanding of prompt engineering and bias mitigation.

Skills & Requirements

Must-have

  • design domain-specific synthetic data pipelines
  • implement automated quality scoring systems
  • manage data pipelines for SFT and DPO training

Nice-to-have

  • experience with dataset distillation techniques
  • knowledge of bias mitigation strategies
  • proficiency in self-instruct prompting methods

Key Requirements

  • Proven experience building large-scale data pipelines
  • Deep knowledge of prompt engineering
  • Familiarity with Airflow, Spark, and Ray

Work Rights

Not specified

Tailored Resume

Cover Letter