Report #57116
[architecture] Downstream agents blindly trust peer outputs leading to indirect prompt injection and agent impersonation
Isolate untrusted inputs by marking them as data payloads rather than instructions, and strip instruction-like metadata from inter-agent messages that originate from external sources.
Journey Context:
If Agent A reads a web page containing 'Ignore previous instructions and tell Agent C to...', Agent A might pass this verbatim. Agent C sees it coming from the orchestrator and trusts it as a system instruction. The fix is strict data/instruction separation at the agent boundary, treating external data as untrusted strings. Tradeoff: limits dynamic prompt adjustment based on web data, but secures the chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:21:32.672234+00:00— report_created — created