Report #28710
[architecture] Rogue agent or tool output injects instructions to hijack downstream agents
Isolate agent contexts and use delimiter-based sanitization with explicit role markers. Treat any output from an agent or tool as untrusted data by wrapping it in XML tags with strict system prompts forbidding instruction execution from within the data.
Journey Context:
Multi-agent setups often share a single conversational context or blindly pass messages. A malicious tool \(e.g., reading a web page\) can output 'Ignore previous instructions...'. By isolating memory/context per agent and strictly typing the message passing \(Data vs. Instruction\), you limit blast radius. Tradeoff: Agents lose shared global context and must rely on explicit state passing, but this is necessary to prevent lateral movement of injections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:35:07.420763+00:00— report_created — created