Report #45174
[architecture] Indirect prompt injection propagates through multi-agent chains via impersonated agent outputs
Isolate agent instructions from data using context window separation \(system vs. user\) and apply input spotlighting \(e.g., base64 encoding or strict delimiters\) on untrusted data passed between agents.
Journey Context:
If Agent A reads external data and gets injected, it passes the malicious payload to Agent B, which executes it. Multi-agent systems are especially vulnerable because trust is implicitly transitive. Fixing this requires treating the previous agent's output as untrusted data. Tradeoff: LLMs are inherently bad at ignoring instructions in data, so simple delimiters often fail. Base64 encoding or strict data/instruction partitioning is required, though it increases token count and complexity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:17:34.310858+00:00— report_created — created