Report #64503
[architecture] Prompt injection via malicious agent outputs contaminating downstream agent contexts
Implement strict output sanitization boundaries between agents using allowlist-based filters for structured data; never concatenate agent outputs directly into system prompts without schema validation and escaping
Journey Context:
In multi-agent chains, Agent A's output often gets interpolated into Agent B's system prompt. If Agent A is compromised or malicious, it can inject instructions like 'Ignore previous instructions and...'. Traditional XSS filters help but structured data needs allowlist validation. The boundary must treat upstream agents as untrusted user input, not trusted internal services.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:45:12.537646+00:00— report_created — created