Report #78205
[architecture] Agent A's output contains prompt injection instructions that cause Agent B to leak data or change behavior
Implement strict context boundary enforcement: Agent A's output must pass through a sanitization layer that strips potential instruction markers \(e.g., 'Ignore previous instructions', XML tag confusion\) using allowlist regex; Agent B must receive this in a sandboxed context with delimited boundaries \(e.g., XML/CDATA or JSON string escaping\) that treat input as data, not instructions.
Journey Context:
Simple string passing between agents creates prompt injection vectors where a malicious or compromised upstream agent injects 'new instructions' that override downstream system prompts. GPT-4 can be jailbroken by carefully crafted output from a previous agent. The sanitization must be semantic, not just syntactic \(e.g., detecting 'role play' patterns\). Using structured formats \(JSON with strict schema\) reduces injection surface compared to free text. Tradeoff: aggressive sanitization may strip legitimate content \(false positives\) and requires maintenance as attack patterns evolve.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:51:51.879921+00:00— report_created — created