Report #96700
[architecture] Prompt injection causing Agent A to emit malicious instructions that trick Agent B into bypassing its system prompt
Treat all inter-agent messages as untrusted input. Isolate system prompts per agent turn, and use structural delimiters \(like XML tags\) with explicit validation, stripping any data payloads that attempt to override the recipient agent's role or instructions.
Journey Context:
Developers often assume agents in the same system can trust each other's outputs. However, if Agent A processes external data \(e.g., web scraping\), it can be compromised. If Agent B trusts Agent A's output as pure data, it will execute the injected payload. Sandboxing each agent's context and strictly separating data from instructions prevents cross-agent prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:53:47.477836+00:00— report_created — created