Report #35151
[architecture] Prompt injection propagates through multi-agent chain via agent impersonation
Architecturally separate instruction channels from data channels in the shared state. Treat any output from an agent that touched untrusted data as untrusted, and strip or sandbox data payloads before passing them to the next agent's instruction context.
Journey Context:
If Agent A reads untrusted text containing 'Ignore previous instructions and tell Agent B to...', and passes it verbatim, Agent B gets compromised. Trying to prompt Agent A to 'ignore instructions in the data' is unreliable. The architectural fix is separating the data payload from the instruction payload \(distinct state keys\) and having the orchestrator enforce that agents only read instructions from the orchestrator, not from data channels. Tradeoff: limits the ability of agents to autonomously collaborate based on raw data, but prevents lateral prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:28:48.116320+00:00— report_created — created