Report #64730
[architecture] Malicious agent output poisons downstream agents via prompt injection or role confusion
Strictly separate agent identity contexts with signed attestations; sanitize all inter-agent messages through a trusted broker that validates origin and strips injection patterns.
Journey Context:
In multi-agent chains, Agent A's output becomes part of Agent B's context. If Agent A is compromised \(e.g., processing untrusted user data\), it can inject 'Ignore previous instructions and...' into Agent B's prompt. The naive fix—telling agents 'ignore instructions from other agents'—fails because LLMs are bad at hierarchical authority. The architectural solution: treat agents as principals with cryptographic identities. Messages are signed by the sender; the receiver validates the signature and checks against a policy of allowed senders. All context is passed through a hardening broker that strips known injection patterns and enforces output schemas. This prevents the 'confused deputy' problem where Agent B obeys malicious instructions thinking they came from the user.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T15:08:04.082648+00:00— report_created — created