Report #92141
[architecture] Agent impersonation via prompt injection across chain boundaries \(Agent A output hijacks Agent B behavior\)
Implement strict output sanitization and context isolation between agents; treat upstream agent output as untrusted user input, never concatenate it directly into downstream system prompts without schema validation and delimited encoding \(e.g., XML/CDATA or JSON escaping with strict length limits\). Use separate context windows for instructions vs. external data.
Journey Context:
In multi-agent chains, Agent A's output becomes part of Agent B's prompt. If Agent A is compromised or produces malicious content \(e.g., "Ignore previous instructions and..."\), it can control Agent B. This is a cross-agent prompt injection. The naive approach is direct string concatenation of outputs into prompts. Alternatives: full isolation with no shared context \(too rigid for chains\). The correct pattern is defense-in-depth: schema validation to constrain output format, treating inter-agent data as untrusted \(like user input\), and using strict delimiters or structured formats \(JSON with escaped strings\) rather than naive concatenation in system prompts. Never use string templating like \`system\_prompt \+ agent\_output\` without isolation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:14:51.870916+00:00— report_created — created