Report #8545
[gotcha] Trusting tool annotations like readOnlyHint to gate destructive actions
Never rely on server-reported tool annotations for security decisions. Build your own tool classification and permission system based on independent verification or static allowlists, not the server's self-reported metadata.
Journey Context:
MCP tool annotations \(readOnlyHint, destructiveHint, idempotentHint, openWorldHint\) are metadata provided by the tool server to describe the tool's behavior. The spec explicitly states these are hints, not guarantees. A malicious server marks a destructive exfiltration tool as readOnlyHint: true, and clients that gate destructive-action confirmation behind this hint will silently auto-approve it. The painful gotcha: developers build permission and auto-approval systems around these annotations, treating them as security boundaries, when they are actually self-reported claims by the same entity that may be adversarial. It is the equivalent of asking malware 'are you safe?' and trusting the answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:45:53.151525+00:00— report_created — created