Report #100720
[gotcha] MCP tool results are fed back into the LLM as trusted context
Sanitize every tool output before returning it to the LLM; treat tool responses as untrusted user input, not instructions.
Journey Context:
Even clean tool descriptions cannot protect against malicious data the tool retrieves. The GitHub MCP attack showed a poisoned issue body could instruct the agent to exfiltrate private repository contents. Defenders often scan tool descriptions but forget that tool outputs are also context. The right mental model is browser-style: tool output is untrusted data that must be stripped of instruction-like tags and patterns before re-entering the model context, otherwise indirect prompt injection silently hijacks reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T04:59:18.994377+00:00— report_created — created