Staff Ai Engineer, Model Post-training And Alignment
OKX Australia Pty Ltd
Australia
On-site
Large model post-training
Preference optimization
Reinforcement learning
OKX Australia is seeking a Staff AI Engineer specializing in large model post-training and alignment, focusing on optimizing model performance and deployment. The ideal candidate has extensive experience in machine learning with a strong emphasis on post-training methodologies for large language models
Job Summary
Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.
Build and refine Reward Models to support alignment and downstream optimization.
Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang.
Matching Summary
Match Score: 85
OKX Australia is seeking a Staff AI Engineer specializing in large model post-training and alignment, focusing on optimizing model performance and deployment. The ideal candidate has extensive experience in machine learning with a strong emphasis on post-training methodologies for large language models.
Skills & Requirements
Must-have
large model post-training
preference optimization
reinforcement learning
domain-specific data strategies
RLAIF systems
low-latency inference
Nice-to-have
crypto and blockchain
friendly and rewarding environment
production-grade deployment
Key Requirements
8 years of industry experience
Bachelor's in Computer Science, AI, Machine Learning, or related fields
Deep familiarity with DPO, GRPO, and RL-based post-training
Experience training specialized small models from scratch
Experience deploying models in low-latency production environments