Report #98435
[gotcha] Tool outputs and error messages from MCP servers are fed back into the LLM context and can carry indirect prompt-injection payloads
Sanitize and validate tool outputs before returning them to the model; separate untrusted tool output from system instructions with visual or structural boundaries; deploy an output guard that blocks exfiltration instructions; and sandbox tool output so it cannot trigger further tool calls.
Journey Context:
Advanced tool poisoning moves the payload from the static tool description into runtime outputs, such as error messages or fetched web content. Static scanners that only inspect the server at install time miss this entirely. Since the LLM processes returned content as context, a poisoned response can instruct the model to leak data or invoke other tools. Runtime inspection of request/response traffic through an MCP gateway is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:58:12.715464+00:00— report_created — created