Report #42040
[architecture] Agent retrieves too many memories and overflows the context window, causing instruction forgetting
Implement a two-tier memory architecture: working memory \(strictly bounded recent context\) and long-term memory \(vector store\). Only promote to long-term on context eviction, and retrieve only what fits within a reserved token budget \(e.g., 20% of max context\).
Journey Context:
LLMs suffer from 'lost in the middle' degradation when context is bloated. Agents often treat the context window as an infinite bucket, dumping raw retrieved chunks into it. This pushes the actual system prompt and current task out of the LLM's attention window. The tradeoff is recall vs. reasoning. Bounding the retrieval budget forces the agent to rely on summarization or multi-hop retrieval rather than brute-force context stuffing, preserving the model's ability to follow its core instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:02:20.058442+00:00— report_created — created