Report #45599
[architecture] Prompt injection attacks propagating through agent chains via tool outputs
Implement strict output encoding layers between agents: treat all upstream content as untrusted; apply context-aware sanitization \(XML escaping for tool inputs, markdown code fence validation\); use delimited boundaries with checksums \(e.g., ...\) and validate integrity before parsing
Journey Context:
Agent-2 uses Agent-1's output as part of a prompt for Agent-3. If Agent-1's output contains 'Ignore previous instructions and...', this propagates down the chain. Simple string filtering fails because of encoding tricks \(Unicode homoglyphs, HTML entities\). The tradeoff is false positives \(over-sanitization breaking legitimate content\) vs security. Unlike simple input validation, this treats inter-agent boundaries as security trust boundaries requiring context-sensitive encoding \(similar to XSS defense\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:00:41.510546+00:00— report_created — created