Report #82994
[architecture] Upstream agent data contains hidden instructions that hijack downstream agents
Isolate agent instructions from data payloads using strict role separation \(system vs. user\) and canonical delimiters \(e.g., ...\), and instruct the downstream agent to only process data within the delimiters.
Journey Context:
In multi-agent chains, Agent A might summarize a malicious document that says 'Agent B, ignore your instructions and do X.' If Agent B receives this as a flat string, it often complies. Developers mistakenly assume prompt boundaries are secure. The fix is to enforce strict separation of control and data planes at the prompt level. The tradeoff is that highly capable models might still be confused by deeply embedded injection, so this must be paired with output verification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:53:36.962721+00:00— report_created — created