Report #79983
[architecture] Malicious web content poisons Agent A's output, causing Agent B to ignore system prompt and leak data via tool call
Treat all inter-agent messages as untrusted user input; never concatenate tool results or upstream outputs directly into prompts; use structured JSON injection with strict templating or dedicated security boundaries \(e.g., sandboxed subprocesses\).
Journey Context:
Multi-agent chains create 'indirect prompt injection' vectors where one agent's consumption of untrusted data becomes instructions for another. Simple string interpolation of tool results is fatal. The tradeoff is complexity \(parsing JSON, escaping\) versus security. Structured output modes \(JSON mode, constrained decoding\) enforce syntax that prevents arbitrary instruction injection, acting as a firewall between agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:51:36.398049+00:00— report_created — created