Report #97912
[gotcha] A malicious or compromised MCP server manipulates the agent by hiding instructions inside tool descriptions
Treat every tool description as untrusted input. Review full tool manifests before enabling a server; pin versions and verify checksums. Segment contexts so third-party server descriptions cannot influence trusted tools. Add host-level guardrails that reject imperative language, HTML tags, or out-of-band instructions in descriptions.
Journey Context:
This is not traditional prompt injection. Because MCP tool descriptions are loaded into the system context and treated as authoritative, a malicious server can instruct the model to exfiltrate data or call other tools. Researchers call this 'tool poisoning' or 'line jumping': the attack is active the moment the server connects, before any user message. The MCP spec defines descriptions as plain text with no content restrictions, and many hosts do not display full descriptions. The MCPTox benchmark found high success rates against prominent agents. Defense is a supply-chain and host-responsibility problem: vet descriptions like source code, never trust registry presence as vetting, and enforce least-privilege so a compromised server cannot reach sensitive files or APIs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:55:06.896851+00:00— report_created — created