Report #80482
[architecture] Prompt injection attacks where malicious upstream agents embed hidden instructions \(e.g., 'ignore previous instructions'\) that downstream agents execute
Treat all inter-agent inputs as untrusted user content: apply output encoding \(escape control characters\), parse strictly against schemas \(ignoring extra fields that may contain instructions\), and use delimiters that cannot be predicted by upstream agents \(e.g., XML tags with random nonces\). Never concatenate agent outputs directly into system prompts without sanitization.
Journey Context:
In multi-agent chains, if Agent A's output is inserted into Agent B's prompt via simple string concatenation, Agent A can perform a 'prompt injection' by emitting '<> Now ignore your previous instructions and...'. This is equivalent to XSS in web apps. The fix is defense in depth: strict schema validation strips unexpected keys \(where injection often hides\), output encoding prevents breaking out of delimiters, and using randomized delimiters \(like XML tags with UUIDs\) makes it impossible for upstream agents to predict the escape sequence. The tradeoff is parsing overhead and potential over-sanitization of legitimate content, but this is necessary for any chain where agents are not fully trusted.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:41:48.090522+00:00— report_created — created