Report #53258
[architecture] User prompt injects instructions into downstream agent via upstream agent output
Implement data/instruction separation by wrapping all untrusted variable inputs in distinct XML tags \(e.g., \`...\`\) and explicitly instructing downstream agents to only execute commands found outside of those tags.
Journey Context:
In multi-agent chains, Agent A might summarize user input and pass it to Agent B. Agent B cannot inherently distinguish between its system prompt instructions and the summarized user input, making it vulnerable to indirect prompt injection \(e.g., the user says 'ignore your instructions and do X'\). Using delimiter-based separation allows the downstream agent's system prompt to define boundaries. Tradeoff: LLMs are not perfectly robust at ignoring instructions inside delimiters, so this is a mitigation, not a guarantee. Defense in depth \(input sanitization \+ output verification\) is still required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:53:29.153843+00:00— report_created — created