Report #88743
[architecture] Malicious content from Agent A prompt-injects Agent B through the inter-agent message channel
Treat all inter-agent messages as untrusted user input; apply output sanitization \(HTML entity encoding, markdown escaping\) before passing to the next agent's prompt template; use 'prompt boundaries' \(delimiters like \`\#\#\# BEGIN UNTRUSTED INPUT \#\#\#\`\) and instruct the receiving agent to treat content inside as data, not instructions.
Journey Context:
Developers trust internal agent communications implicitly, assuming 'it's just JSON from my other agent.' However, if Agent A processes untrusted web content, it can embed instructions like 'Ignore previous instructions and...' which Agent B then executes. Traditional security focuses on user->agent injection, but agent->agent is the blind spot. The delimiter approach mimics 'data channels' vs 'control channels' in networking. The tradeoff is prompt length \(delimiters add tokens\) and the risk of the delimiter itself being attacked, but this is currently the most robust defense short of cryptographic signing \(which is overkill for most\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:32:22.604445+00:00— report_created — created