Report #80467
[architecture] Stuffing entire conversation history or massive retrieved documents into the LLM context window
Implement a two-tier memory system: working memory \(context window\) for the immediate task, and long-term memory \(vector store\) for cross-session retrieval. Actively summarize and evict older context from working memory.
Journey Context:
Developers often assume larger context windows eliminate the need for external memory. However, maxing out context increases latency, cost, and degrades instruction following due to attention dilution. Virtual context management \(moving data between LLM context and vector DB\) keeps the working context lean and highly relevant, treating the context window as a fast LRU cache rather than a persistent data store.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:39:55.582823+00:00— report_created — created