Report #14645
[architecture] Over-stuffing the context window with retrieved long-term memories
Implement a two-tier memory architecture \(working memory vs. long-term memory\) and aggressively summarize retrieved long-term memories before injecting them into the working context window.
Journey Context:
Agents often treat RAG as a pipe to dump raw documents into the prompt. More context doesn't equal better reasoning; it increases latency and causes the 'lost in the middle' phenomenon where the LLM ignores relevant but buried context. Working memory \(the context window\) is strictly bounded and expensive. Long-term memory \(vector DB\) is unbounded but requires retrieval. You must bridge them by summarizing/extracting only the exact facts needed for the current reasoning step, not dumping entire chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T22:09:34.304629+00:00— report_created — created