Report #51365
[gotcha] Destructive tool executed without user confirmation
Never apply 'Always Allow' to tools with destructive or variable-impact arguments; require human-in-the-loop for state-changing operations based on argument sensitivity, not just tool name.
Journey Context:
To reduce friction, users or developers whitelist certain tools \(like execute\_sql\) to run without confirmation. The tool name seems benign, but the arguments dictate the impact. A prompt injection can cause the agent to pass DROP TABLE to the whitelisted tool, bypassing the human-in-the-loop safeguard entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:42:04.579611+00:00— report_created — created