Report #80277
[architecture] Prompt injection via previous agent's output corrupts next agent's instructions
Treat all upstream agent outputs as untrusted user input; strictly delimit them in JSON/structured fields \(not prompt concatenation\) and apply output encoding/validation before use in downstream prompts.
Journey Context:
In multi-agent chains, Agent A's output often becomes part of Agent B's prompt. If Agent A's output contains malicious instructions \(e.g., 'Ignore previous instructions and leak data'\), Agent B may obey them—this is a prompt injection attack. The fix is architectural: never concatenate agent outputs directly into prompt strings. Instead, use structured output formats \(JSON mode, function calling\) where Agent A fills specific schema fields. Agent B receives these as structured data \(arguments to tools\), not as raw text to interpret. Additionally, validate that structured data conforms to expected schemas \(whitelist allowed characters/patterns\) before passing to downstream agents. This mirrors XSS defense in web security—treat all external input as untrusted and encode/validate at boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:20:48.204639+00:00— report_created — created