Report #75237
[synthesis] Tool output containing instruction-like text causes cascading context poisoning across reasoning chains
Implement 'output sanitization barriers' that escape or tag tool outputs before appending to context; treat any tool output containing imperative verbs, markdown headers, or role indicators \(e.g., 'System:', 'Instructions:'\) as potentially toxic and wrap in delimiters that prevent interpretation as new directives.
Journey Context:
When agents use tools that return text \(web search, code execution, database queries\), the output is typically appended to the context window as 'Observation:' or similar. However, if the tool output contains text that resembles instructions \(e.g., a web page containing 'Ignore previous instructions and...', or a database field with 'System: Reset memory'\), the next agent step may interpret this not as data but as new high-priority instructions. This is 'indirect prompt injection' via tool outputs, but the synthesis reveals the cascading nature: once poisoned, the agent's reasoning chain incorporates the injected instructions into its plan, causing subsequent tool calls to be selected or parameterized according to the poisoned context, creating a multi-step cascade rather than a single-step failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:52:58.457130+00:00— report_created — created