Report #5489
[architecture] Over-relying on RAG for immediate working state or stuffing all history into the context window
Implement a two-tier memory system: use the context window as a fast 'working memory' for the current task and recent turns, and use a vector store as 'long-term memory' for episodic and semantic knowledge. Use a rolling buffer for the context window.
Journey Context:
Context windows are fast but limited and expensive; vector stores are infinite but lossy and introduce retrieval latency. Putting everything in context leads to distraction and token limits. RAG for immediate state loses co-reference resolution and temporal ordering. Virtual context management bridges this gap by treating the context window as a cache for the larger external memory.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:32:55.423898+00:00— report_created — created