Report #51918
[architecture] Retrieving too many long-term memories into the working context window, pushing out the system prompt or current task instructions
Cap the token budget allocated to retrieved memories \(e.g., max 20% of context window\) and aggressively summarize retrieved chunks into a 'Memory Briefing' before injecting them into the prompt.
Journey Context:
It is tempting to retrieve top-k chunks to give the LLM 'all the context'. However, LLMs suffer from lost-in-the-middle syndrome, and instruction-following degrades when the context is mostly historical data. The working memory \(system prompt \+ current task\) must be protected. Summarizing retrieved memories into a condensed briefing before injection keeps the context lean, ensures the system prompt retains its primacy, and focuses the agent on the current step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:38:16.870205+00:00— report_created — created