Ai Evaluation & Safety Engineer

BRG

**
3+ years ml evaluation experience
Python fundamentals and llm apis
Building testing and benchmarking pipelines
** BRG is seeking an AI Evaluation & Safety Engineer to enhance the reliability of its AI tools used in various consulting domains. The ideal candidate will have extensive experience in machine learning evaluation, particularly with large language models, and will be responsible for establishing evaluation frameworks and monitoring systems to ensure AI outputs maintain high quality and safety standards. **

Job Summary

  • This role owns the evaluation discipline that ensures AI tools are trustworthy enough for BRG experts to rely on in complex litigation and healthcare engagements.
  • You will design end-to-end evaluation frameworks, implement automated metrics like ROUGE and BERTScore, and build guardrails to prevent hallucinations and unsafe outputs.
  • The position requires partnering with subject-matter experts to curate gold-standard datasets and operationalizing evaluation within CI/CD pipelines so no model ships without passing quality gates.

Matching Summary

Match Score: 75

** BRG is seeking an AI Evaluation & Safety Engineer to enhance the reliability of its AI tools used in various consulting domains. The ideal candidate will have extensive experience in machine learning evaluation, particularly with large language models, and will be responsible for establishing evaluation frameworks and monitoring systems to ensure AI outputs maintain high quality and safety standards. **

Skills & Requirements

Must-have

  • 3+ years ML evaluation experience
  • Python fundamentals and LLM APIs
  • Building testing and benchmarking pipelines
  • pytest and pandas proficiency
  • LLM failure mode understanding
  • Retrieval quality and hallucination detection

Nice-to-have

  • Production RAG or agentic pipeline experience
  • Adversarial red-teaming and prompt injection
  • Observability tooling (Grafana, Prometheus)
  • CI/CD integration for evaluation gates
  • AI compliance in regulated industries
  • Model interpretability techniques

Key Requirements

  • 3+ years experience in ML evaluation or AI safety
  • Strong Python programming skills
  • Experience with LLM evaluation frameworks

Work Rights

Not specified

Tailored Resume

Cover Letter