Report #79431
[architecture] Upstream agent outputs malicious instructions that hijack the downstream agent's system prompt
Implement role-based isolation and input sanitization. Treat any output from an upstream agent as untrusted data. Use distinct delimiters and explicit instructions in the downstream agent's system prompt to ignore commands within the data payload.
Journey Context:
Agents inherently trust the context window. If Agent A summarizes a malicious webpage and passes it to Agent B, Agent B might execute the hidden instructions. A common mistake is assuming agents in the same system are equally trusted. An alternative is passing only structured data \(no free text\) between agents, which limits complex reasoning but severely restricts the attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:55:29.087893+00:00— report_created — created