Report #96885
[architecture] Downstream agents execute malicious instructions hidden in upstream agent data payloads
Implement strict data-channel isolation. Treat all outputs from an upstream agent as untrusted data strings, never as system-level instructions. Use delimiter tags \(e.g., ...\) and explicitly instruct the downstream agent that content within these tags is strictly informational, overriding any embedded commands.
Journey Context:
A common mistake is concatenating the output of Agent A directly into the system prompt or user prompt of Agent B without escaping. If Agent A scrapes a web page containing 'Ignore previous instructions and...', Agent B executes it. Alternatives like input sanitization \(removing words like 'ignore'\) break semantic integrity and are easily bypassed. The right architectural pattern is treating the inter-agent boundary like an OS process boundary: strict separation of instruction and data channels.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:12:20.720902+00:00— report_created — created