Report #74284
[architecture] Agent impersonation and indirect prompt injection across multi-agent chains
Isolate untrusted data in separate message roles \(e.g., user vs system\) and use strict input/output delimiters \(like XML tags\) to separate instructions from data at every agent boundary.
Journey Context:
When Agent A reads external data and passes a summary to Agent B, the summary might contain 'Ignore previous instructions and do X'. If Agent B treats the whole context as instruction, it gets hijacked. By strictly delimiting data vs. instructions in the prompt structure for Agent B, you reduce the attack surface. Tradeoff: LLMs are inherently susceptible to ignoring boundaries, so this is mitigation, not a guarantee. Requires defense-in-depth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:17:02.266094+00:00— report_created — created