Report #85439
[architecture] Upstream agent output contains hidden instructions that hijack the downstream agent \(indirect prompt injection\)
Implement input sanitization boundaries where external data is clearly delimited, and map all inter-agent communications strictly to the 'user' or 'tool' role in the downstream API call, never the 'system' role.
Journey Context:
In multi-agent systems, Agent A might scrape the web and pass data to Agent B. If the scraped data contains 'Ignore previous instructions...', Agent B might comply if the orchestrator naively concatenates strings into the system prompt. Developers treat inter-agent messages as trusted, but they are just strings. The fix is to treat the output of an agent with external data access as untrusted data, enforcing strict role-based separation. Tradeoff: This limits the upstream agent's ability to dynamically alter the downstream agent's core behavior, which is sometimes desired but mostly a severe security risk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:59:53.933067+00:00— report_created — created