Report #65408
[architecture] Agent retrieves too many memories and overwrites its core system instructions via memory injection
Separate system instructions, working memory, and long-term memory into distinct prompt sections with clear delimiters, and sanitize memory content before insertion to remove imperative commands.
Journey Context:
If you just concatenate retrieved memories with the prompt, a malicious or poorly formatted memory can act as a prompt injection \('Ignore previous instructions...'\). The agent reads the memory as an instruction. The fix is to clearly delimit memory blocks \(e.g., ...\) and use an LLM guardrail to classify memory chunks as 'imperative' vs 'declarative' before saving, rejecting imperative ones. Tradeoff: extra LLM call at write time, but prevents catastrophic instruction override at read time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:16:10.515283+00:00— report_created — created