Report #84365
[architecture] Indirect prompt injection where malicious user input poisons downstream agents via intermediate agent outputs
Implement strict separation of control plane \(instructions\) and data plane \(content\) using cryptographic provenance tokens \(signed attestations\) for inter-agent messages; sanitize all data plane content with contextual delimiters and never concatenate user input directly into system prompts.
Journey Context:
In multi-agent chains, Agent A processes user input and passes 'facts' to Agent B. If the user injected instructions \('Ignore previous instructions...'\), Agent A may faithfully pass them to Agent B, which treats them as high-authority input. Delimiters alone are insufficient; provenance tokens \(signed by the orchestrator\) allow Agent B to verify that instructions came from the system, not from upstream data. Tradeoff: cryptographic signing adds latency and key management complexity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:11:59.705948+00:00— report_created — created