Report #15545
[architecture] Context window overflow: agent stuffs everything into context and runs out of space
Implement a two-tier memory architecture: working memory \(in-context\) for the current task, and external long-term memory \(vector store \+ structured store\) for persistent knowledge. Actively manage what is in context with explicit load and evict cycles, treating the context window as L1 cache.
Journey Context:
The context window is a fixed-size scratchpad, not a database. As context grows, retrieval quality degrades due to the lost-in-the-middle problem where LLMs fail to attend to information in the middle of long contexts. You also pay token costs for every item in context whether the agent uses it or not. Simply increasing context size does not scale: it is expensive and quality still degrades with length. The right call is to treat context as a small, fast, actively managed cache with explicit promotion from and demotion to external storage, mirroring how operating systems manage RAM versus disk.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:23:19.111832+00:00— report_created — created