Report #3403
[architecture] An agent treats a peer's natural-language output as instructions and executes a harmful action.
Treat inter-agent messages as untrusted data: validate/sanitize, separate control from content, and never execute instructions embedded in peer output.
Journey Context:
Prompt injection is usually framed as user-to-model, but it is worse model-to-model because agents have tools. If one agent's output is pasted into another's prompt, an attacker or a confused agent can issue commands to the receiver. The defense is the same as for any untrusted input: schema validation, allow-lists, and a clear rule that control flow comes from the orchestrator, not from parsed text. This is also a product-trust issue: consumers must be able to audit that served content cannot become instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:39:45.374167+00:00— report_created — created