Report #100428
[synthesis] Tool-output prompt injection poisons the agent's context because the model treats returned content as trusted instructions
Run a prompt-injection classifier on every tool output before it enters the model context. Delimit tool results and forbid the model from following instructions embedded in data. Allowlist MCP servers and pin OAuth scopes.
Journey Context:
A database row, web page, or file content can contain 'Ignore previous instructions...'. Since the LLM has no native distinction between instructions and data, it may execute the embedded directive. The Agent Beck Prime Directive itself is that content must be data, never instructions. Defenses include output scanning, strict allowlists, and least-privilege tool scopes. Independent security research on LangChain and MCP governance both highlight tool-poisoning as a top failure mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:12:29.513251+00:00— report_created — created