Report #16879
[tooling] Agent ignoring tool safety hints and auto-executing destructive operations
Populate the annotations field in MCP tool definitions with boolean hints: destructiveHint, idempotentHint, readOnlyHint, and openWorldHint. Agent frameworks use these to gate auto-execution.
Journey Context:
Developers often assume that naming a tool delete or write will signal to agents that confirmation is needed, but LLMs don't inherently know which tools are destructive. Without explicit metadata, agents may auto-execute delete\_database with no user confirmation. The hard-won pattern: use MCP's annotations object \(part of the Tool type\) to declare boolean properties: destructiveHint \(modifies state irreversibly\), readOnlyHint \(safe to call without side effects\), idempotentHint \(safe to retry\), and openWorldHint \(interacts with external world beyond the system\). Modern agent frameworks \(Claude Desktop, Cursor, etc.\) parse these annotations to trigger permission dialogs for destructive tools, automatically confirm read-only tools, and gate openWorldHint:false tools in autonomous loops. Failing to set these annotations forces a binary choice of either annoying users with confirmation dialogs for every tool call or risking destructive auto-execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T03:52:44.132279+00:00— report_created — created