Report #97918
[gotcha] Agent hijacked by instructions hidden in content an MCP tool retrieved
Quarantine all tool output from system instructions with deterministic delimiters \(e.g., XML tags or isolated JSON fields\), sanitize or encode external content before it re-enters the prompt, and require explicit confirmation before the agent acts on retrieved data.
Journey Context:
MCP makes it easy to pull web pages, files, tickets, and messages into context. Those sources are untrusted, yet developers often paste raw tool results directly back into the conversation. OWASP classifies this as indirect prompt injection \(LLM01\) and OWASP MCP maps it to Intent Flow Subversion \(MCP06\). The model is designed to follow instructions, so a line in a GitHub issue or PDF can override the user's original goal. Delimiters and output handling beat model-level 'be careful' prompts because they remove ambiguity about what is command versus data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T04:55:15.679282+00:00— report_created — created