Ai Qa Trainer - Llm Evaluation - Freelance Project

Invisible Expert Marketplace

Remote
$6 - $65 ph; not specified; not specified
Remote
Llm safety evaluation
Prompt robustness testing
Hallucination detection
We need your expertise to harden model reasoning and reliability by challenging advanced language models on various tasks

Job Summary

  • We need your expertise to harden model reasoning and reliability by challenging advanced language models on various tasks.
  • You will converse with the model on real-world scenarios, verify factual accuracy, design test plans, and capture reproducible error traces.
  • Partner on adversarial red-teaming, automation (Python/SQL), and dashboarding to track quality deltas over time.

Matching Summary

We need your expertise to harden model reasoning and reliability by challenging advanced language models on various tasks.

Salary

$6 - $65 per hour; Not specified; Not specified

Skills & Requirements

Must-have

  • LLM safety evaluation
  • prompt robustness testing
  • hallucination detection
  • bias and fairness audits
  • chain-of-reasoning reliability
  • tool-use correctness
  • retrieval-augmentation fidelity

Nice-to-have

  • metacognitive communication
  • improving prompt engineering
  • developing evaluation metrics

Key Requirements

  • Bachelor's, Master's, or PhD in CS, Data Science, Computational Linguistics, Statistics, or related field
  • Shipped QA for ML/AI systems
  • Safety/red-team experience
  • Test automation frameworks (e.g., PyTest)
  • LLM eval tooling experience (e.g., OpenAI Evals, RAG evaluators, W&B)

Work Rights

Not specified

Tailored Resume

Cover Letter