Report #59391
[architecture] Malicious upstream agent injecting instructions that compromise downstream agents
Implement strict input sanitization and context isolation; use digital signatures to verify agent identity; treat upstream content as untrusted user input regardless of source; implement prompt separation delimiters
Journey Context:
In multi-agent chains, Agent A's output becomes Agent B's prompt. If Agent A is compromised or malicious, it can inject 'ignore previous instructions' attacks. Standard prompt injection defenses fail because agents trust other agents implicitly. Cryptographic provenance and strict privilege separation are required to prevent privilege escalation via prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:10:40.630613+00:00— report_created — created