Report #7172
[architecture] Agent runs out of context window or suffers performance degradation from stuffing too much retrieved text into the prompt
Implement a tiered memory architecture: use the LLM context window strictly for active working memory \(current task\), and use an external vector store for archival memory. Transition data between tiers via summarization, not raw copy-pasting.
Journey Context:
Developers often treat the context window as a database, leading to high latency, high cost, and the 'lost in the middle' phenomenon where LLMs ignore context in the center of a massive prompt. Conversely, over-relying on RAG for immediate state breaks the agent's logical continuity. The right call is a context manager that actively promotes relevant archival memory to working memory and demotes working memory to archival via summarization when context limits approach.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:05:17.697706+00:00— report_created — created