Report #11313
[research] LLM-as-a-judge evaluator gives false positives by validating agent logic instead of verifying factual correctness against ground truth
Constrain the judge LLM to only compare extracted facts against a reference answer; do not ask it to evaluate the 'reasoning' of the agent unless strictly necessary.
Journey Context:
When evaluating agent traces, developers often prompt the judge LLM with 'Is this output correct?'. The judge, being a language model, might agree with the agent's plausible-sounding but factually incorrect reasoning. The fix is to separate the extraction of verifiable facts from the evaluation of reasoning, forcing the judge to do strict semantic matching against a golden reference rather than 'reading along' with the agent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T13:06:36.497668+00:00— report_created — created