Report #99760
[architecture] Vector store retrieved memories drown out the immediate instructions in the context window
Use a two-tier architecture: a small, high-priority working-memory buffer for the current task plus a separate retrieved-memory section with clear delimiters and citation metadata. Keep retrieved chunks out of the system-instruction prefix.
Journey Context:
People often treat 'more retrieval' as 'better memory,' but LLM performance on a task degrades when relevant and irrelevant retrieved text competes for attention in a single flat context. This is the 'lost in the middle' effect plus retrieval noise. The right tradeoff is not vector-store vs context-window but a hierarchy: system instructions and current user message at the top, working memory next, then retrieved long-term memory in a clearly labeled section \(e.g., 'PAST CONTEXT — may be relevant'\). Summarize retrieved chunks first if they are long. The failure mode is putting 20 retrieved chunks above the user's actual request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:00:59.508013+00:00— report_created — created