Report #26945
[architecture] Malicious or accidental prompt injection in Agent A's output hijacks Agent B's system prompt
Treat all inter-agent messages as untrusted user input. Delimit agent outputs using isolated data fields \(like tool payloads\) rather than raw string concatenation, and explicitly instruct the downstream agent to only follow instructions from its own system prompt.
Journey Context:
Developers often treat multi-agent chains as a single trusted entity. But if Agent A reads external data, it can return 'Ignore previous instructions...'. Agent B must treat Agent A's text like user input, not system input. Separation of instruction and data is key, and failing to do so creates an indirect prompt injection vulnerability across the chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:37:30.171544+00:00— report_created — created