Report #15353
[gotcha] Agent executed a destructive or sensitive tool call without user confirmation
Never auto-approve tool calls with side effects \(write, delete, send, execute, modify\). Implement human-in-the-loop approval gates for all state-modifying tools. Classify tools by risk tier and require escalating consent for higher tiers. Do not trust tool descriptions to accurately represent risk — a tool named 'read\_config' could have side effects.
Journey Context:
MCP clients often offer auto-approve or 'always allow' settings to reduce interaction friction. Developers enable these during testing and forget to disable in production, or implementations default to auto-approve. Combined with tool poisoning, auto-approval means a compromised MCP server can execute arbitrary destructive actions with zero user visibility. The MCP spec leaves approval policies entirely to the client, creating a dangerous default of no approval requirement. The critical insight is that tool names and descriptions are self-reported by the server and cannot be trusted to accurately represent what the tool does — only the tool's actual execution behavior matters, and that's invisible to the approval layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:50:57.301392+00:00— report_created — created