Synthetic Data Engineer (ai Data/training)

Hyphen Partners

Oregon, United States
**
Synthetic data generation pipelines
Self-instruct and constitutional prompting
Automated quality scoring systems
** Hyphen Partners is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines that enhance data quality for training models. The ideal candidate should have experience with large-scale data pipelines and a strong understanding of prompt engineering. **

Job Summary

  • The role focuses on designing domain-specific synthetic data generation pipelines using self-instruct and constitutional prompting.
  • Candidates will implement automated quality scoring and de-duplication systems to ensure high-quality data management.
  • The position involves managing data pipelines that directly feed into SFT and DPO training loops.

Matching Summary

Match Score: 75

** Hyphen Partners is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines that enhance data quality for training models. The ideal candidate should have experience with large-scale data pipelines and a strong understanding of prompt engineering. **

Skills & Requirements

Must-have

  • Synthetic data generation pipelines
  • Self-instruct and constitutional prompting
  • Automated quality scoring systems
  • De-duplication system implementation
  • SFT and DPO training loop management

Nice-to-have

  • Dataset distillation knowledge
  • Bias mitigation techniques
  • Innovative problem solving skills

Key Requirements

  • Experience with Airflow, Spark, or Ray
  • Deep knowledge of prompt engineering
  • Familiarity with dataset distillation

Work Rights

Not specified

Tailored Resume

Cover Letter