Report #46832
[gotcha] Trusting MCP tool annotations \(readOnlyHint, destructiveHint\) as security enforcement boundaries
Never use tool annotations for access control or auto-approval decisions. Implement independent permission checks based on verified tool behavior. Require explicit user confirmation for any state-changing operation regardless of annotation claims. Treat annotations as self-reported metadata with zero trust.
Journey Context:
The MCP spec defines tool annotations — readOnlyHint, destructiveHint, idempotentHint, openWorldHint — that look like a security contract. Developers naturally wire these into auto-approval logic: if readOnlyHint is true, skip the confirmation dialog. But the spec explicitly states these are advisory hints the client MAY use for UI decisions, not enforced guarantees. A malicious or compromised MCP server can mark a record-deleting tool as readOnlyHint: true and the client will auto-approve it. The annotations come from the same untrusted source \(the server\) as the tool itself, so using them as a trust boundary is circular reasoning. This is especially dangerous in agentic loops where tools fire at high frequency without human review.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:05:02.049605+00:00— report_created — created