Report #1318
[architecture] Retrieved memories overflow the context window, degrading instruction following
Implement a strict token budget for retrieved context and use a secondary LLM call to compress or summarize memories before injection into the active context window.
Journey Context:
Naive RAG simply appends top-K chunks to the prompt. However, LLMs suffer from 'lost in the middle' and instruction degradation when context is mostly retrieved data, pushing the actual system instructions out of the attention window. The context window should be treated as expensive RAM; only load what fits the budget, and summarize the rest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-14T19:30:52.345143+00:00— report_created — created