Report #87135
[architecture] Malicious prompt injection via upstream agent output compromises downstream agents
Isolate agent contexts and use delimiter-based sanitization with explicit role tagging. Never concatenate untrusted agent outputs directly into the system prompt of another agent; place them in user/tool turns with strict boundaries.
Journey Context:
Multi-agent systems often pass the raw string output of one agent directly into the context window of another. If Agent A gets hijacked, it outputs instructions targeting Agent B. By isolating Agent A's output in a distinct message role rather than the system prompt, and using clear delimiters, Agent B's system prompt remains authoritative. Tradeoff: increases context length, but prevents lateral movement of prompt injections.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:50:48.813636+00:00— report_created — created