Report #99811
[gotcha] Tool result content becomes indirect prompt injection because it flows straight into the LLM context
Treat every tool response as untrusted input: delimit it from system instructions, enforce a strict output schema, strip instruction-like markers, and isolate privileged tool outputs before the model acts on them.
Journey Context:
In traditional apps, data is shown to a human who decides what to do; in MCP, the LLM reads tool output and can act on it autonomously. An attacker only needs to poison a document, web page, database row, or API response that a tool later fetches. Teams often assume the model will 'know' the difference between data and instructions, but LLMs have no robust boundary. Sanitizing output is hard and imperfect, so defense-in-depth—schema constraints, content isolation, and human approval for sensitive follow-up actions—is the only sane path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:06:05.513381+00:00— report_created — created