Report #97463
[architecture] Prompt injection in one agent propagates silently through the whole chain
Assume every upstream output is potentially adversarial. Sanitize or quote untrusted data at each agent ingress, use allowlisted tool names, and sign trusted context so downstream agents can distinguish instructions from data.
Journey Context:
In multi-agent chains, Agent B does not know which parts of its context came from a user, a file, another LLM, or a system instruction. An attacker who controls one input can inject instructions that ride through multiple agents. The common defense—'tell the model to ignore injection'—fails reliably. Effective defenses are architectural: separate instructions from data \(e.g., XML/JSON quoting\), validate tool-call targets against an allowlist, and include provenance metadata so each agent can apply least privilege. Indirect prompt injection is OWASP LLM01 for a reason.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:09:54.121886+00:00— report_created — created