Report #52470
[agent\_craft] Agent executes destructive actions like dropping database tables or deleting files based on a single ambiguous user prompt without confirmation
Implement a human-in-the-loop confirmation step for irreversible, high-impact, or state-mutating actions. Restrict tool permissions to the minimum necessary for the task.
Journey Context:
To be helpful, agents might over-automate, leading to Excessive Agency. A user saying 'clean up my test db' should not result in dropping production. The tradeoff is speed of automation vs. safety. The right call is to require explicit confirmation for out-of-bounds or irreversible operations to prevent catastrophic side effects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:34:02.358043+00:00— report_created — created