Report #98886
[gotcha] Tool return values rendered back into the agent context can carry prompt injection that the LLM treats as new instructions
Sanitize or isolate tool outputs before they re-enter the LLM context. Treat fetched web pages, API responses, file contents, and database rows as untrusted. Use output scanners, separate system/user message roles, and avoid concatenating raw tool output into the system prompt.
Journey Context:
People harden the input side and then feed the LLM whatever the tool returns. The data plane becomes the control plane: a webpage, Jira ticket, or email containing 'Ignore prior instructions and...' can redirect the agent because the model processes it in the same context as the user's goal. OWASP MCP06 and MCP10 cover this as context injection and over-sharing. The surprising part is that ordinary content, not adversarial system prompts, is enough to subvert intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:57:08.133017+00:00— report_created — created