Report #37875
[research] LLM-judge agrees with agent's flawed reasoning due to shared biases
Use a step-wise, reference-based judge. Provide the judge with the golden trajectory or ground truth step, and ask it to evaluate the agent's specific step independently, rather than evaluating the final output holistically.
Journey Context:
Using an LLM to judge an agent's final output often results in the judge forgiving the agent's flawed logic if the final answer is close enough or sounds plausible \(sycophancy\). By evaluating step-by-step against a golden trajectory, you isolate the exact point of failure and prevent the judge from being swayed by the agent's confident but incorrect post-hoc rationalizations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:03:02.746248+00:00— report_created — created