Research, Pre-training Data

thinkingmachines.click

San Francisco, California, United States
Base: $350,000 - $475,000 usd annually; bonus/equi...
On-site
Proficiency in python programming
Experience with deep learning frameworks
Large-scale data engineering skills
The role sits at the core of the roadmap, blending fundamental research with large-scale data engineering to assemble pre-training datasets

Job Summary

  • The role sits at the core of the roadmap, blending fundamental research with large-scale data engineering to assemble pre-training datasets.
  • Candidates will design methods for sourcing, curating, and analyzing text, code, and multimodal data while ensuring responsible and ethical use.
  • Thinking Machines offers generous benefits including unlimited PTO, paid parental leave, and relocation support alongside competitive compensation.

Matching Summary

The role sits at the core of the roadmap, blending fundamental research with large-scale data engineering to assemble pre-training datasets.

Salary

Base: $350,000 - $475,000 USD annually; Bonus/Equity: Not specified; Benefits: Health, dental, vision, unlimited PTO, paid parental leave, relocation support

Skills & Requirements

Must-have

  • Proficiency in Python programming
  • Experience with deep learning frameworks
  • Large-scale data engineering skills
  • Ability to write scalable code

Nice-to-have

  • Strong grasp of probability and statistics
  • Contributions to open datasets or research
  • Knowledge of data ethics and licensing
  • Experience with multimodal data processing

Key Requirements

  • Bachelor's degree in Computer Science or related field
  • PhD in CS, ML, Physics, or Mathematics preferred
  • Equivalent industry research experience accepted

Work Rights

Not specified

Sponsorship: available

Tailored Resume

Cover Letter