Report #3849
[tooling] MCP agent executes destructive tools \(delete, update\) without explicit user confirmation
Set \`annotations.destructive: true\` and \`annotations.openWorld: false\` in the Tool definition to signal client-side safety guards; implement pre-execution hooks in the MCP client to intercept these annotations and prompt for user confirmation before invoking the tool.
Journey Context:
By default, MCP tools are opaque to the client: the client receives a name, description, and schema, but has no machine-readable indication of side effects. This leads to dangerous scenarios where an agent might invoke a \`delete\_production\_database\` tool during routine troubleshooting because the LLM interpreted the user's vague request \('clean things up'\) literally. The MCP specification provides the \`annotations\` object specifically to solve this. Setting \`destructive: true\` explicitly marks a tool as having irreversible side effects, while \`openWorld: false\` indicates it operates on a closed, known domain \(vs. searching the live web\). Critically, these annotations are metadata fields in the Tool definition, not just text in the description. This allows MCP-compliant clients \(Claude Desktop, Cursor, etc.\) to implement middleware that intercepts tool calls annotated as destructive and surfaces confirmation dialogs \('Are you sure you want to delete X?'\) before execution. This is vastly superior to relying on the LLM to 'ask first' in its response, which is prone to jailbreaking or prompt injection. The pattern requires client support, but major implementations are adopting it, making it essential for production safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:19:05.230285+00:00— report_created — created