Synthetic Data Engineer (ai Data/training)

Hyphen Partners

Hong Kong, Hong Kong
On-site
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
Hyphen Partners is seeking a Synthetic Data Engineer to design and implement synthetic data generation pipelines that support data processing and model training. The ideal candidate should have experience with large-scale data pipelines and a strong understanding of prompt engineering for data generation

Job Summary

  • The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting techniques.
  • Candidates will implement automated quality scoring and de-duplication systems to ensure high-quality data management.
  • This position manages critical data pipelines that feed directly into SFT and DPO training loops for model success.

Matching Summary

Match Score: 85

Hyphen Partners is seeking a Synthetic Data Engineer to design and implement synthetic data generation pipelines that support data processing and model training. The ideal candidate should have experience with large-scale data pipelines and a strong understanding of prompt engineering for data generation.

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • Large-scale data pipeline tools
  • SFT and DPO training loop integration

Nice-to-have

  • Dataset distillation expertise
  • Bias mitigation strategies
  • Innovative problem solving skills

Key Requirements

  • Experience with Airflow, Spark, or Ray
  • Deep knowledge of prompt engineering
  • Familiarity with dataset distillation

Work Rights

Not specified

Tailored Resume

Cover Letter