Report #3764
[gotcha] Why is my LLM following hidden instructions embedded in a tool description?
Treat all tool descriptions from third-party MCP servers as untrusted prompts; sanitize or isolate them before injecting into the LLM context.
Journey Context:
Developers assume tool descriptions are inert metadata, but the LLM processes them as high-priority system instructions. A malicious MCP server can embed 'IGNORE PREVIOUS INSTRUCTIONS...' in the description, which the LLM obeys over the user's actual prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:11:03.666061+00:00— report_created — created