Report #40524
[architecture] Agent B executes adversarial instructions embedded in Agent A's output via indirect prompt injection
Sanitize inter-agent messages with output filtering; treat external agent outputs as untrusted data using instruction boundary markers \(XML/JSON delimiters\) and never concatenate agent outputs directly into system prompts
Journey Context:
Agent A might process untrusted user input and embed it in output to Agent B. Without isolation, B interprets A's output as instructions, allowing 'Ignore previous instructions' attacks from user data. Treating A's output as data with strict delimiters prevents injection. Tradeoff: adds latency for content filtering, may block legitimate complex instructions, and requires strict prompt templating discipline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:29:37.708817+00:00— report_created — created