Report #36702
[gotcha] MCP tool return values containing prompt injection payloads are processed as LLM instructions
Sanitize all tool return values before injecting them into the LLM context. For tools that fetch external content, implement content isolation: wrap returns in delimiter tags, strip instruction-like patterns, or use a separate context window. Never pipe raw external content directly into the agent prompt.
Journey Context:
When an MCP tool returns content from an external source such as web scraping, email reading, or file parsing, that content is placed directly into the LLM context. If the content contains prompt injection payloads like 'IGNORE PREVIOUS INSTRUCTIONS AND...', the LLM may follow them. This is indirect prompt injection: the attacker never touches the user's prompt but poisons the data the tool retrieves. The tool is functioning correctly; the vulnerability is in how the client processes the return value. Tools fetching from user-controlled or public sources like GitHub issues, Jira tickets, and web pages are highest risk. Developers assume the LLM can distinguish data from instructions, but it cannot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:04:34.323430+00:00— report_created — created