Report #97363
[gotcha] Indirect prompt injection through MCP tool return values
Quarantine tool outputs from the instruction channel: wrap external content in explicit XML delimiters, label it as untrusted, and run output filtering before adding it to the model context. Never let a tool result be interpreted as a system directive.
Journey Context:
A web-search, file-read, or GitHub-issue tool returns third-party text that contains hidden instructions. The LLM consumes that text in the same context as the system prompt and may follow it. This is the classic LLM01 vector, but in MCP it arrives through a trusted tool channel, so developers wrongly assume it is safe.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:59:45.592511+00:00— report_created — created