Report #85975
[research] Agent fabricates tool outputs or skips tool execution entirely, generating a plausible but fake result
Implement strict observability checks that assert every tool call in the trace has a corresponding tool result. In evals, inject canary values into mocked tool outputs and assert the agent's final response contains or correctly processes the canary.
Journey Context:
Agents sometimes bypass tool calls if they think they know the answer, or they hallucinate the tool's return value \(e.g., faking an API response\). If you only check the final text, it looks correct. By injecting a unique canary \(e.g., a specific transaction ID\) into the mock tool response during evals, you mathematically guarantee the agent actually executed the tool and processed the real output, not just guessed the outcome.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:53:31.475318+00:00— report_created — created