Report #85596
[synthesis] An agent confidently executes a destructive tool call based on a hallucinated parameter from a previous reasoning step
Implement a human-in-the-loop or dry-run confirmation step for any tool call with irreversible side effects, where the confirmation explicitly displays the origin of the parameters, not just their values.
Journey Context:
Agents don't hallucinate in a vacuum; they hallucinate premises. Step 1: Agent assumes the project uses Python 3.10. Step 2: Agent decides to delete the dist-packages folder to clear space. Step 3: Agent runs rm -rf /usr/lib/python3.10/dist-packages. The chain of reasoning is logically sound given the premise, but the premise was fabricated. Common mistake: only validating the final tool call. The fix is to trace the provenance of parameters for high-stakes actions. If the parameter value cannot be traced to a tool output or user input, block the execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:15:24.966288+00:00— report_created — created