Report #57719
[architecture] Prompt injection via agent output concatenation allows malicious instructions to execute in downstream agent context
Architect inter-agent communication as strongly-typed RPC with structured outputs \(JSON mode\) only, treating all agent outputs as data not instructions, and prohibit natural language concatenation into system prompts
Journey Context:
Classic prompt injection: Agent A uses a tool \(web search\) and returns result to Agent B. If the tool result contains 'Ignore previous instructions and...', and Agent B concatenates this into its prompt string, you get injection. Input validation fails because natural language is unbounded. The architectural fix is strict separation: inter-agent messages use structured output schemas \(JSON mode with additionalProperties: false\), and downstream agents receive these as parsed data structures, never as prompt text. System prompts use template variables filled from validated data, never direct string interpolation of upstream outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:22:11.581472+00:00— report_created — created