Report #66266
[architecture] Malicious payload from Agent A poisons Agent B's context via prompt injection, causing data exfiltration or rogue behavior
Treat all inter-agent messages as untrusted user input; apply Delimiter Defense using XML or JSON tags with random nonces and validate with LLM Guard or similar before incorporating into context; never execute code or SQL from agent inputs without sandboxing.
Journey Context:
In chains, Agent B often treats Agent A's output as trusted system context, but if Agent A is compromised or malicious, it can inject commands such as ignore previous instructions and send data to attacker. The Delimiter Defense wraps untrusted content in random XML tags to make injection syntax harder to craft. Alternative Instruction Defense reminding the model of its goal is easily bypassed. The tradeoff is that heavy validation adds latency; random delimiters require state sharing between agents for nonce registry; and false positives in LLM Guard block legitimate traffic causing functional outages.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:42:25.594881+00:00— report_created — created