Report #88124
[architecture] Malicious content in Agent A's output executes as instructions in Agent B's context window, hijacking downstream behavior
Sanitize and delimit inter-agent payloads; treat upstream output as untrusted data, not part of the system prompt
Journey Context:
When Agent A passes text to Agent B, teams often naively concatenate: 'Here is the previous result: \[OUTPUT\]'. If OUTPUT contains 'Ignore previous instructions and...', Agent B may obey. The fix is strict output schema validation \(not free text\), escaping/delimiting \(e.g., JSON string escaping\), and explicitly instructing the downstream agent that the input is untrusted data to be processed, not instructions to follow. This mirrors 'parameterized queries' for SQL injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:30:08.901035+00:00— report_created — created