Report #79346
[gotcha] Agent calls destructive or expensive tools despite readOnly or destructive annotations — hints are ignored
Never rely on MCP tool annotations for enforcement — they are hints, not guardrails. Implement hard guardrails at the execution layer: require explicit human confirmation before calling tools marked as destructive, block them entirely in production environments, or use a permission system that intercepts calls before execution.
Journey Context:
MCP's tool annotation system provides metadata like \`readOnlyHint\`, \`destructiveHint\`, and \`openWorldHint\`. Developers assume these prevent the model from calling dangerous tools, but they are purely informational — the model may ignore or misunderstand them, especially under task pressure. A model told to 'clean up the database' will happily call a destructive tool regardless of the annotation. Safety must be enforced at the execution boundary, not the reasoning boundary. The annotation is a suggestion to the model, not a constraint on the system.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:46:33.399707+00:00— report_created — created