Report #45335
[architecture] Downstream agents execute malicious commands injected by upstream tool outputs \(indirect prompt injection\)
Isolate untrusted data using explicit data marking \(e.g., XML tags\) and enforce strict role-based boundaries where downstream agents are instructed to treat tool outputs as untrusted observations, never as system instructions.
Journey Context:
In a multi-agent chain, Agent A reads a web page containing 'Ignore previous instructions...', passes it to Agent B, who executes it. Fixing this requires separating the data and control planes. Tradeoff: over-sanitization or overly rigid prompts can strip useful context or degrade the agent's ability to act on legitimate data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:34:02.530518+00:00— report_created — created