Report #38947
[architecture] Upstream agent passes malicious instructions from external data, causing downstream agent to execute unintended actions
Treat all upstream agent outputs as untrusted data. Use strict context delimiters \(e.g., XML tags\) to separate instructions from data, and sanitize the data payload for instruction-like patterns before passing it to the downstream agent.
Journey Context:
In a retrieval or web-browsing chain, Agent A might fetch text containing 'Ignore previous instructions and...'. If passed raw to Agent B, Agent B often complies, thinking the injected text is its own system prompt. Sandboxing LLMs is fundamentally unsolved, so the best architectural mitigation is input sanitization and strict context separation. Tradeoff: Over-sanitization might strip benign data vs. preventing agent impersonation and indirect prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:50:57.303762+00:00— report_created — created