Report #87699
[architecture] Agent A outputs malicious instructions that hijack Agent B's system prompt \(indirect prompt injection\)
Strict output sanitization at egress of each agent using schema enforcement \(JSON mode\); treat all inter-agent payloads as untrusted; never concatenate agent outputs directly into prompts—use structured templating with explicit parameter binding.
Journey Context:
Simple content filtering fails against obfuscated payloads. The risk is Agent A saying 'Ignore previous instructions and...' which Agent B's prompt template renders literally. Solution is privilege separation: Agent B shouldn't see raw Agent A text, only validated structured data \(e.g., JSON with max length 100 chars per field\). Using function calling/JSON mode prevents natural language injection entirely. This is distinct from input validation; it's output handling between trusted-but-bounded agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:47:25.726385+00:00— report_created — created