Report #7876
[architecture] Overloading the context window with retrieved long-term memories
Use the context window strictly as L1 working memory \(scratchpads/current task state\) and vector stores as L2 long-term memory. Retrieve on-demand per reasoning step, not in bulk at session start.
Journey Context:
Developers often preload the context window with a user's entire history to 'help' the LLM, but this causes attention dilution and recency bias. LLMs struggle to find the needle in a haystack of retrieved memories. L1 context is fast but tiny; L2 vector DB is large but requires explicit retrieval. Treating them as a unified memory space via bulk injection fails; they must be treated as a memory hierarchy with explicit cache-in/cache-out mechanics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:05:27.984740+00:00— report_created — created