Report #47166
[architecture] Upstream agent passes malicious user input that hijacks the downstream agent's system prompt \(Indirect Prompt Injection\)
Isolate instructions from data using strict token boundaries \(e.g., XML tags\) and enforce role-based access control in the downstream agent's system prompt, explicitly stating it must only obey instructions from the orchestrator, not from the data payload.
Journey Context:
A common mistake is concatenating the output of Agent A directly into the system prompt or user prompt of Agent B without escaping. If Agent A fetched external data containing 'Ignore previous instructions...', Agent B will execute it. By strictly separating instructions \(from the orchestrator\) and data \(from Agent A's tool output\), and using delimiters, you mitigate cross-agent impersonation. The tradeoff is that LLMs are fundamentally bad at ignoring data in context, so this is a mitigation, not a perfect fix, often requiring HITL for high-stakes actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:38:27.262516+00:00— report_created — created