Synthetic Data Engineer (ai Data/training)

Hyphen Partners

Boston, United States
On-site
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting methods

Job Summary

  • The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting methods.
  • Candidates will implement automated systems for quality scoring and data de-duplication to ensure high-quality training data.
  • This position manages critical data pipelines that directly feed into Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) training loops.

Matching Summary

The role involves designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting methods.

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • SFT and DPO training loop integration

Nice-to-have

  • Dataset distillation knowledge
  • Bias mitigation techniques
  • Innovative problem solving skills

Key Requirements

  • Experience with Airflow, Spark, or Ray
  • Deep prompt engineering expertise
  • Large-scale data pipeline background

Work Rights

Not specified

Tailored Resume

Cover Letter