Synthetic Data Engineer (ai Data/training)

Hyphen Partners

China
On-site
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
Hyphen Partners is seeking a Synthetic Data Engineer to design and implement synthetic data generation pipelines for training purposes. The role requires expertise in managing data quality and processing to support model training, with a strong focus on prompt engineering and large-scale data pipeline development

Job Summary

  • The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting techniques.
  • Candidates will implement automated quality scoring and de-duplication systems to ensure high-quality data management.
  • This position manages data pipelines that feed directly into SFT and DPO training loops to drive model success.

Matching Summary

Match Score: 85

Hyphen Partners is seeking a Synthetic Data Engineer to design and implement synthetic data generation pipelines for training purposes. The role requires expertise in managing data quality and processing to support model training, with a strong focus on prompt engineering and large-scale data pipeline development.

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • SFT and DPO training loop integration

Nice-to-have

  • Dataset distillation expertise
  • Bias mitigation strategies
  • Innovative problem solving skills

Key Requirements

  • Experience with Airflow, Spark, and Ray
  • Deep knowledge of prompt engineering
  • Proven track record in large-scale data pipelines

Work Rights

Not specified

Tailored Resume

Cover Letter