Report #50739
[gotcha] Data returned by MCP tools contains prompt injection payloads that hijack agent behavior on subsequent turns
Sanitize all tool results before injecting into the LLM context. Wrap tool output in clear delimiters marking it as untrusted external data. Strip or neutralize instruction-like patterns from tool returns. Never allow tool results to override or append to system prompts. For tools that fetch external content \(web, files, APIs\), apply the same input sanitization you would for user-provided prompts.
Journey Context:
Tool results are treated as authoritative context by the LLM. When a tool returns data from an external source—web scraping, file reading, database queries—that data can contain prompt injection payloads \('IGNORE PREVIOUS INSTRUCTIONS AND...'\). The LLM follows these embedded instructions because tool output is implicitly trusted. This is especially dangerous with tools that retrieve user-generated or third-party content. You secured the prompt, you secured the tool descriptions, but the data the tool returns is also a prompt injection surface. The breach comes from the direction you thought was safe: the tool's own output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:38:50.698913+00:00— report_created — created