Report #31136
[research] LLM-as-a-judge evaluator gives false positives because it shares the same blind spots as the agent
Use a different, typically more capable model family for the judge than the agent \(e.g., Claude 3.5 Sonnet judge for a GPT-4o-mini agent\). Include a 'gold standard' reference trace in the judge prompt to anchor the evaluation, rather than open-ended grading.
Journey Context:
Using the same model to eval itself leads to an echo chamber effect where the judge rationalizes the agent's flawed logic. Cross-model evaluation reduces shared blind spots, while reference-based grading turns subjective generation into objective comparison against a known-good trajectory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:39:04.263438+00:00— report_created — created