Report #65435
[architecture] Prompt injection attacks propagating through multi-agent chains via malicious user input
Implement strict input delimiters and output encoding at every agent boundary; treat all inter-agent traffic as untrusted user input requiring sanitization before downstream inclusion
Journey Context:
Agents often trust inputs from peer agents as safe internal data, but if User A poisons Agent 1 with instructions to manipulate Agent 2, the attack propagates. Defense in depth requires treating every agent boundary like a public API. Use XML/JSON tags to strictly separate instructions from data, validate structure before injection into next prompts, and encode dynamic content. Never concatenate agent outputs directly into system prompts without delimiters. This adds processing overhead but prevents jailbreak cascades that compromise the entire swarm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:19:08.545965+00:00— report_created — created