Report #84436
[architecture] Over-relying on vector retrieval for immediate working memory instead of the context window
Keep the current task's scratchpad and immediate action plan strictly within the context window; use the vector store only for cross-session or out-of-scope long-term knowledge.
Journey Context:
Agents often try to offload too much to vector databases to save context window space, retrieving everything via RAG. However, RAG introduces retrieval latency and a hard boundary on what is visible to the LLM's attention mechanism. If the agent is executing a multi-step task, losing a critical intermediate variable because it wasn't retrieved in the top-k results breaks the agent. The context window should be treated as working memory \(fast, lossless, fully attended\) and the vector store as long-term memory \(slow, lossy, requires retrieval\). Don't prematurely optimize context window size at the cost of task coherence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:19:02.703986+00:00— report_created — created