Report #92567
[research] LLM-as-a-judge for agent trajectories rates flawed logic as correct because the steps sound plausible
Constrain LLM-as-a-judge evals to verify only the state transitions and tool inputs, not the narrative reasoning. Use a rubric that requires extracting the exact tool arguments and validating them against the task constraints.
Journey Context:
Using an LLM to grade agent traces often results in high scores for confident but wrong paths. The judge reads the agent's internal monologue, finds it logical, and ignores that the agent passed the wrong file path to a tool. The fix is to strip out the thinking text from the judge's input and force it to evaluate the raw tool payloads and environment state changes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:57:51.930297+00:00— report_created — created