Report #99767
[architecture] Stored user content contains instructions that later get executed as part of the agent's context
Treat all stored memory as untrusted data: sanitize before embedding, isolate memory content from system instructions with structural delimiters, and never execute raw strings retrieved from memory without validation.
Journey Context:
This is the prompt-injection risk of memory: if an agent stores a user message or tool output that contains an embedded instruction \('Ignore previous instructions and...'\), that stored text can be retrieved into a future prompt and influence behavior. The fix is defense in depth: content goes through a write-time filter, retrieval output is visually separated from system instructions, and any retrieved command-like text is parsed/validated rather than pasted into a shell or tool call. A dangerous shortcut is treating long-term memory as 'trusted context' because the agent itself stored it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:01:51.824146+00:00— report_created — created