Llm Engineer (llm Evaluation)

42DOT

Pangyo, South Korea
Remote
Llm evaluation framework experience
Python async service development
Argo workflows and mlflow integration
42DOT is seeking an LLM Engineer to evaluate and enhance the performance of large language models (LLMs) through the development of assessment systems and platforms. The ideal candidate should possess over three years of experience in LLM evaluation and deep learning, along with a strong proficiency in Python and a collaborative mindset

Job Summary

  • The role focuses on building a robust evaluation system to ensure the reliability and continuous improvement of Large Language Models (LLMs).
  • Candidates will design automation pipelines using Argo Workflows and MLflow to manage end-to-end model validation and deployment verification.
  • This position requires establishing reproducible benchmarks and protocols to detect performance regressions in rapidly changing LLM environments.

Matching Summary

Match Score: 85

42DOT is seeking an LLM Engineer to evaluate and enhance the performance of large language models (LLMs) through the development of assessment systems and platforms. The ideal candidate should possess over three years of experience in LLM evaluation and deep learning, along with a strong proficiency in Python and a collaborative mindset.

Skills & Requirements

Must-have

  • LLM evaluation framework experience
  • Python async service development
  • Argo Workflows and MLflow integration
  • Benchmark dataset design
  • Evaluation protocol establishment

Nice-to-have

  • Kubernetes container environment expertise
  • GPU cluster distributed inference experience
  • Datadog and Prometheus monitoring setup
  • Large-scale data pipeline design
  • Collaborative team communication skills

Key Requirements

  • 3+ years experience in LLM training or evaluation
  • Deep learning or NLP research background
  • Experience with lm-eval, HELM, or OpenAI Evals

Work Rights

Not specified

Tailored Resume

Cover Letter