Report #59485
[architecture] Upstream agent or tool output injects malicious instructions into downstream agent
Isolate agent contexts and use explicit data marking \(like XML tags\) to separate instructions from data, stripping instructions from tool outputs before passing state.
Journey Context:
Multi-agent systems often share a single context window or pass raw text. If Agent A summarizes a malicious webpage, it might pass 'Ignore previous instructions...' to Agent B. By strictly separating instruction prompts from data payloads using delimiters, and having the orchestrator strip/ignore instructions from data sources, you mitigate cross-agent injection. Tradeoff: agents lose the ability to send 'meta-instructions' to each other, which is often a feature people mistakenly rely on.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:20:17.610478+00:00— report_created — created