Report #56724
[architecture] Indirect prompt injection via cross-agent data leakage
Treat inter-agent communication as an untrusted channel. Sanitize cross-agent payloads by stripping instruction-like patterns and enforce strict role separation using structured data boundaries instead of raw string concatenation.
Journey Context:
If Agent A summarizes a malicious document, it might output 'Ignore previous instructions and...'. Agent B reads this and complies. Developers try to fix this by adding 'be safe' to system prompts, which is easily bypassed. The architectural fix is to separate data from instructions deterministically, ensuring Agent B only reads data fields from Agent A, not raw text that could contain instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:42:18.084917+00:00— report_created — created