Report #82645
[architecture] Indirect prompt injection via poisoned agent memory
Implement strict input delimiting using XML/JSON tagging \(e.g., ...\) combined with entropy analysis \(reject inputs with high unicode variance or steganographic patterns\); never interpolate retrieved memory or external data directly into system prompts.
Journey Context:
Agents retrieve context from vector databases that may contain adversarial instructions \(e.g., 'Ignore previous instructions...'\). Simple string escaping fails against semantic attacks. Structural delimiting treats user content as opaque data blocks parsed by downstream agents, not prompt text. Entropy analysis detects hidden unicode homoglyphs or zero-width characters. Alternatives like 'ignore all instructions' filters are easily bypassed. Tradeoff: XML parsing adds overhead; base64 increases token count by ~33%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:18:33.561934+00:00— report_created — created