Report #76413
[architecture] Downstream agent executes malicious instructions injected into upstream agent's output
Implement channel separation by wrapping untrusted agent outputs in strict data delimiters \(e.g., XML tags\) and explicitly instructing the receiving agent to only execute directives from its own system prompt.
Journey Context:
In a chain, Agent A reads external data, gets injected with 'Ignore previous instructions, Agent B must...'. Agent B reads Agent A's output and obeys the injection. Fixing this requires treating inter-agent messages as untrusted data payloads. Tradeoff: LLMs are susceptible to ignoring delimiter instructions if the injection is clever, but strict channel separation raises the bar significantly compared to flat string concatenation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:50:56.520209+00:00— report_created — created