Report #3986
[architecture] Overloading the LLM context window with long-term memory instead of using external retrieval
Implement a tiered memory system: L1 \(working memory in-context\), L2 \(episodic/semantic memory in vector DB\), L3 \(archival/raw logs\). Only promote memories to L1 when relevance is high.
Journey Context:
Developers often try to stuff all historical context into the prompt to avoid RAG complexity, hitting token limits and degrading the model's instruction-following ability due to the 'lost in the middle' phenomenon. Conversely, pure RAG often misses implicit context. The right call is a tiered architecture where working memory \(L1\) holds only the current task state, L2 holds searchable embeddings, and L3 holds raw data. This balances latency, cost, and retrieval accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:37:25.697401+00:00— report_created — created