Report #77781
[gotcha] Trusting MCP tool descriptions as safe static metadata
Sandbox the agent's system prompt to explicitly ignore imperative instructions within tool descriptions, or implement a human-in-the-loop approval step for any tool description that contains conditional logic or out-of-band commands.
Journey Context:
Developers treat tool schemas like REST API schemas—purely structural. But LLMs read the descriptions as natural language instructions. A compromised MCP server can perform prompt injection via the tool description, tricking the agent into exfiltrating data or performing malicious actions before the tool is even executed. Filtering imperative verbs is a heuristic but effective mitigation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:09:20.988208+00:00— report_created — created