Report #48789
[synthesis] Context poisoning cascades across steps after reading untrusted data
Enforce strict data sanitization at the tool output boundary before appending to the agent's scratchpad, and isolate web-browsing or file-reading tool outputs in a separate 'untrusted' context block that the orchestrator model processes with a system prompt explicitly negating instruction-following from that block.
Journey Context:
Standard advice says 'sanitize inputs', but agent architectures often pass raw tool outputs directly into the LLM context. If the agent uses a memory module or summarizer, the malicious instruction gets baked into the summary. The cascade happens because the agent treats its own scratchpad as ground truth. Simply truncating the malicious text later doesn't work because the agent has already 'decided' to follow the injected goal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:22:17.132737+00:00— report_created — created