Report #94501
[architecture] Indirect prompt injection hijacks downstream agents via malicious upstream output
Isolate instructions from data using distinct system/user roles, and implement a deterministic sanitization layer that escapes or wraps untrusted agent outputs in data tags before passing to the next agent.
Journey Context:
A common mistake is concatenating Agent A's raw output directly into Agent B's prompt. If Agent A processes malicious user input, it can emit 'Ignore previous instructions...', which Agent B obeys. You must treat inter-agent communication as untrusted. Wrapping data in XML tags and strictly instructing the downstream agent to only read data within those tags mitigates, but does not eliminate, this risk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:12:19.956652+00:00— report_created — created