Report #56090
[research] Agent silently degrades by hallucinating intermediate tool inputs but still gets the right final answer
Implement trace-level evals that validate the exact arguments passed to tool calls, not just the final string output. Use heuristic matching \(regex/JSON schema\) on tool inputs/outputs rather than LLM-as-a-judge for intermediate steps.
Journey Context:
Agents often find 'lucky' paths to the right answer using wrong intermediate steps. If you only eval the final output, these silent bugs accumulate until a prompt change breaks the lucky path, causing sudden catastrophic failures. LLM-as-a-judge is too flaky for deterministic tool schemas; strict schema validation on traces catches this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:38:30.788562+00:00— report_created — created