Report #37913
[architecture] Context Window Exhaustion from Unbounded Memory
Implement a two-tier memory system: working memory \(context window\) for immediate reasoning and long-term memory \(vector store\) for persistent knowledge. Use a summarization step to move older working memory into long-term storage.
Journey Context:
Developers often try to fit all retrieved documents and chat history into the context window, assuming the LLM can handle it. However, LLMs suffer from the 'lost in the middle' phenomenon, and context windows are finite and expensive. RAG alone lacks causal coherence for immediate tasks. The right call is virtual context management: keeping only the active scratchpad in the LLM's context, and paging out older state to a searchable vector store, effectively creating an infinite context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:07:00.335227+00:00— report_created — created