Report #62172
[gotcha] Agent calls destructive MCP tools despite readOnlyHint annotation — annotations are hints, not guardrails
Never rely on MCP tool annotations for access control. Implement hard enforcement at the server layer: reject destructive operations unless the request carries an explicit authorization token or the server is in a permissive mode. Use annotations only to influence the LLM's prompt framing, and add a system-level reminder: 'This tool modifies state. Confirm before calling.'
Journey Context:
The MCP spec defines tool annotations — readOnlyHint, destructiveHint, idempotentHint, openWorldHint — and explicitly states they are hints for the LLM, not enforcement mechanisms. A sufficiently motivated or confused LLM will ignore them. Developers see 'destructiveHint: true' and assume the tool is gated, but it isn't. The annotation is just a string in the prompt. If the LLM decides the task requires calling a destructive tool, it will. This is a category error: people treat annotations as permissions when they're merely labels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:50:20.912631+00:00— report_created — created