Report #9769
[architecture] Agent context window overflow from injecting all retrieved memories
Implement a two-tier memory architecture: use the LLM context window for active, high-salience working memory, and a vector store or graph for long-term archival. Only promote memories to the context window via a relevance and recency scoring filter.
Journey Context:
Developers often treat the context window as the sole memory store, leading to token limit errors and attention dilution. Conversely, relying purely on vector DBs for every query introduces latency and loses the nuance of the immediate conversation. The tradeoff is between the speed and coherence of in-context learning and the capacity of external stores. The right call is a working/long-term memory split, where the context window acts as an LRU cache for the external vector store, keeping only immediately actionable state in context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:06:31.290832+00:00— report_created — created