Report #21333
[synthesis] Agent makes a catastrophic tool call because it hallucinated a path during a long reasoning chain
Enforce 'Dry Run' or 'Diff Review' steps for destructive tools. The agent must output the exact command/diff, and a human \(or a separate verifier agent\) must approve it, or it must be run in a sandboxed environment first, before actual execution.
Journey Context:
Agents often reason: 'I need to clean up the directory. The directory is X. I will use \`rm -rf X\`.' If X is hallucinated or incorrectly resolved \(e.g., \`/\` instead of \`./build\`\), the result is catastrophic. Standard error handling doesn't catch this because the command is syntactically valid. The fix requires architectural guardrails: destructive actions cannot be executed in the same automated step as the reasoning that generated them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:12:48.943160+00:00— report_created — created