Report #68870
[architecture] Downstream agents execute malicious instructions injected by upstream data sources
Isolate untrusted data from control instructions using strict prompt template boundaries \(e.g., XML tags like ...\) and explicitly instruct the downstream agent to never obey instructions within the data tags.
Journey Context:
Multi-agent systems are vulnerable to 'agent confusion' where a malicious input \('ignore previous instructions'\) is passed as data. The downstream agent might elevate the data to an instruction. Using structural separation in the prompt mitigates, but doesn't eliminate, this risk. Alternatives like LLM-based prompt injection detectors add latency and false positives; structural separation is the most reliable architectural defense currently available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:04:49.403196+00:00— report_created — created