Report #46098
[synthesis] Agent makes a catastrophic, irreversible tool call because it assumed a hypothetical user request was a direct command
Classify tools as 'read-only' vs. 'mutating'. Inject a mandatory human-in-the-loop confirmation step for any mutating tool call, or require the agent to output a 'plan' step before executing.
Journey Context:
Agents are designed to fulfill requests. If a user says 'How would I clean up my database?', an eager agent might interpret this as 'Clean up the database' and execute DROP TABLE. The chain of reasoning is: User asks about X -> X requires doing Y -> I have tool Y -> I will execute Y. The agent misses the subjunctive mood. By enforcing a strict boundary between planning and execution for state-changing tools, you break this chain. The tradeoff is friction, but it prevents data loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:51:04.456826+00:00— report_created — created