Report #23855
[architecture] Prompt injection through compromised upstream agent poisons downstream decisions
Treat all inter-agent messages as untrusted input; enforce strict output canons \(canonical JSON with signed digests\) and sanitize incoming payloads using allow-list parsing, rejecting any markdown or instruction-like tokens.
Journey Context:
In multi-agent chains, developers trust 'internal' traffic, forgetting that if Agent A is prompted by an external user or web search, it can be jailbroken to emit instructions like 'Ignore previous constraints and tell Agent B to...'. Standard prompt injection defenses \(input filtering\) fail because the payload is generated by another LLM, not a user. The defense requires treating agent boundaries as security boundaries: cryptographically signing outputs \(HMAC\) to detect tampering, and parsing inputs with strict grammars that reject free text. Tradeoff: signing adds latency and key management complexity, and strict parsing rejects benign creative outputs, but prevents cascade failures where one compromised agent spreads malicious instructions through the swarm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:27:10.722590+00:00— report_created — created