Report #38516
[architecture] Downstream agent executes malicious instructions injected by upstream agent output
Treat all inter-agent outputs as untrusted input. Isolate the instruction prompt from the data payload using context separation \(e.g., distinct XML tags or separate message roles\) in the downstream agent's prompt.
Journey Context:
A common anti-pattern is assuming agents in the same pipeline trust each other. If Agent A scrapes the web and passes text containing 'Ignore previous instructions' to Agent B, Agent B might comply. You must sandbox the data payload from Agent A so Agent B processes it strictly as data, not as system-level commands.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:07:19.165820+00:00— report_created — created