Synthetic Data Engineer (ai Data/training)

Hyphen Connect

Australia, Australia
On-site
Design domain-specific synthetic data pipelines
Implement automated quality scoring systems
Manage data pipelines for sft and dpo training
Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines for training loops. The ideal candidate should possess experience in building large-scale data pipelines and have a strong background in prompt engineering

Job Summary

  • The role involves designing domain-specific synthetic data generation pipelines to ensure high-quality data management for training loops.
  • Candidates will implement automated quality scoring and de-duplication systems to support data processing efficiency.
  • The position requires managing data pipelines that feed directly into Supervised Fine-Tuning and Direct Preference Optimization training loops.

Matching Summary

Match Score: 85

Hyphen Connect is looking for a Synthetic Data Engineer to design and implement synthetic data generation pipelines for training loops. The ideal candidate should possess experience in building large-scale data pipelines and have a strong background in prompt engineering.

Skills & Requirements

Must-have

  • design domain-specific synthetic data pipelines
  • implement automated quality scoring systems
  • manage data pipelines for SFT and DPO training
  • build large-scale data pipelines using Airflow Spark Ray

Nice-to-have

  • deep knowledge of prompt engineering techniques
  • familiarity with dataset distillation methods
  • experience with bias mitigation strategies

Key Requirements

  • Proven experience building large-scale data pipelines
  • Deep knowledge of prompt engineering for data generation
  • Familiarity with dataset distillation and bias mitigation

Work Rights

Not specified

Tailored Resume

Cover Letter