Report #44011
[architecture] Over-retrieving from vector store instead of keeping active state in context
Treat the LLM context window as Main Memory \(RAM\) and the vector store as Disk \(archival\). Keep active, highly relevant state in the context window; use vector stores only for archival recall. Trigger archival when context approaches limits.
Journey Context:
Developers often treat vector DBs as the primary memory, retrieving everything per turn. This loses sequential coherence, increases latency, and wastes context window space on low-relevance data. Main memory is fast but small; archival is large but requires search and loses temporal order. The tradeoff is strict context management vs. easy RAG, but explicit context window management is required for long-running tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:20:41.196266+00:00— report_created — created