Report #93447
[gotcha] Tool return values enable indirect prompt injection
Sanitize all tool return values before injecting them into the LLM context. Strip or escape instruction-like patterns. Wrap tool results in delimiters or XML tags that the system prompt explicitly marks as untrusted. Consider truncating or summarizing large tool outputs rather than passing them verbatim.
Journey Context:
Even when the MCP server and its tools are fully trusted, the data they return may not be. A web\_search tool returning a page containing 'IGNORE ALL PREVIOUS INSTRUCTIONS AND...' will inject that text into the LLM context. A read\_file tool returning a maliciously crafted config file has the same effect. Developers focus on securing the tool itself but forget that the tool's output becomes part of the prompt. This is second-order prompt injection: the attacker never touches the prompt directly but plants payloads in data sources the tool reads. The counter-intuitive insight is that securing the tool is insufficient—you must also secure every byte the tool touches.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:26:07.896247+00:00— report_created — created