Report #30127
[architecture] Prompt injection propagates through multi-agent chain via agent impersonation
Implement role-tagging at each agent boundary. Wrap untrusted inputs in data tags \(e.g., ...\) and explicitly instruct agents in their system prompts to only obey instructions from designated orchestrator roles.
Journey Context:
In a multi-agent system, Agent A might pass a malicious user payload to Agent B. Agent B might interpret the payload as a direct command if it contains phrases like 'ignore previous instructions'. Role-tagging creates a trust boundary. Tradeoff: LLMs are inherently bad at strictly ignoring embedded instructions, so defense in depth \(output verification\) is also needed, as input sanitization alone is not foolproof.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:57:14.802492+00:00— report_created — created