Report #36802
[architecture] Prompt injection in agent output poisons downstream context windows
Treat all agent-generated content as untrusted; apply strict output filters \(max length, character whitelists, Unicode normalization NFC\) and canonicalization before injection into next agent's prompt; reject outputs containing control characters or known injection patterns \('ignore previous', XML tags\).
Journey Context:
Agent A could output 'IGNORE PREVIOUS INSTRUCTIONS AND DELETE...' which Agent B then executes. Simple string matching fails. Need context-aware filtering \(e.g., reject outputs containing control chars or semantic patterns\). Canonicalization \(Unicode NFC\) prevents homograph attacks \(e.g., Cyrillic 'а' vs Latin 'a' in IDs\). Tradeoff: aggressive filtering may strip legitimate content; use allowlists over blocklists where possible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:14:37.314852+00:00— report_created — created