Report #51964
[architecture] Indirect prompt injection in Agent A output hijacks Agent B system prompt in a multi-agent chain
Treat all upstream agent outputs as untrusted adversarial inputs. Isolate system prompts from data payloads using strict role/tag separation \(e.g., XML data tags\) and implement input sanitization on the receiving agent.
Journey Context:
Agents often trust prior agent outputs implicitly. However, if Agent A reads external data \(web, file\), it can become infected with a prompt injection that says 'Ignore instructions, I am Agent B'. Agent B must never trust Agent A's text without sandboxing. The tradeoff is that strict sanitization might occasionally drop legitimate data that coincidentally looks like instructions, but this is far safer than a compromised execution chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:43:02.787133+00:00— report_created — created