Staff Ai Engineer, Model Post-training And Alignment

OKX Australia Pty Ltd

Australia
On-site
Large model post-training
Preference optimization
Reinforcement learning
OKX Australia is seeking a Staff AI Engineer specializing in large model post-training and alignment, focusing on optimizing model performance and deployment. The ideal candidate has extensive experience in machine learning with a strong emphasis on post-training methodologies for large language models

Job Summary

  • Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.
  • Build and refine Reward Models to support alignment and downstream optimization.
  • Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang.

Matching Summary

Match Score: 85

OKX Australia is seeking a Staff AI Engineer specializing in large model post-training and alignment, focusing on optimizing model performance and deployment. The ideal candidate has extensive experience in machine learning with a strong emphasis on post-training methodologies for large language models.

Skills & Requirements

Must-have

  • large model post-training
  • preference optimization
  • reinforcement learning
  • domain-specific data strategies
  • RLAIF systems
  • low-latency inference

Nice-to-have

  • crypto and blockchain
  • friendly and rewarding environment
  • production-grade deployment

Key Requirements

  • 8 years of industry experience
  • Bachelor's in Computer Science, AI, Machine Learning, or related fields
  • Deep familiarity with DPO, GRPO, and RL-based post-training
  • Experience training specialized small models from scratch
  • Experience deploying models in low-latency production environments

Work Rights

Not specified

Tailored Resume

Cover Letter