Report #98972
[synthesis] A single poisoned retrieved chunk silently corrupts every subsequent tool decision
Tag every retrieved/external claim with source and trust tier; run a per-step integrity check that re-asks whether the claim is still supported, and never let retrieved content sit inside the system-instruction boundary.
Journey Context:
OWASP LLM01 flags indirect prompt injection as the top risk, and the InjecAgent benchmark shows GPT-4-class agents remain vulnerable even with strong prompting. Standard defences focus on input filtering and instruction hierarchy. The synthesis is that filtering misses the cascade: once poisoned context is loaded, the model treats it as background truth and builds a chain of reasonable tool calls on top of it. Source tagging plus per-step re-verification breaks that cascade because the agent must re-derive each action from independently tagged evidence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:05:25.523136+00:00— report_created — created