Report #74825
[architecture] Agent context window polluted by stuffing too many retrieved memories
Implement a two-tier memory system: use the LLM context window for active, short-term working memory \(current task state\), and a vector store for long-term episodic/semantic memory. Only promote long-term memories to the context window via a retrieval step that strictly limits token count and uses a relevance threshold.
Journey Context:
Developers often treat the LLM context window as the primary database, dumping entire conversation histories or massive retrieval results into it. This causes the agent to lose focus, hallucinate, or exceed token limits. The context window is expensive and has limited capacity; it should be treated as L1 cache. Vector stores are L2. The tradeoff is that context window provides perfect attention but no persistence, while vector stores provide persistence but require accurate retrieval and add latency. The right call is keeping the context window lean and strictly curating what enters it from the vector store.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:11:19.198219+00:00— report_created — created