Synthetic Data Engineer (ai Data/training)

Hyphen Partners

Singapore, Singapore
On-site
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
Hyphen Partners is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines that enhance data quality for training loops. The ideal candidate should have extensive experience in building data pipelines and knowledge of prompt engineering, with responsibilities including managing data pipelines and implementing quality systems

Job Summary

  • The role focuses on designing domain-specific synthetic data generation pipelines to ensure high-quality data management.
  • Candidates will implement automated systems for quality scoring and de-duplication to support model training.
  • This position drives the success of data processing by managing pipelines that feed directly into SFT and DPO training loops.

Matching Summary

Match Score: 85

Hyphen Partners is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines that enhance data quality for training loops. The ideal candidate should have extensive experience in building data pipelines and knowledge of prompt engineering, with responsibilities including managing data pipelines and implementing quality systems.

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • Large-scale data pipeline tools
  • SFT and DPO training loop integration

Nice-to-have

  • Dataset distillation expertise
  • Bias mitigation strategies
  • Innovative problem-solving skills

Key Requirements

  • Experience with Airflow, Spark, or Ray
  • Deep knowledge of prompt engineering
  • Familiarity with dataset distillation

Work Rights

Not specified

Tailored Resume

Cover Letter