Report #1628
[architecture] Over-retrieving from the vector store and stuffing the context window causing the agent to ignore the actual task
Limit retrieval to top-k where k is small \(3-5\), and only trigger retrieval when the agent's working memory lacks necessary context, rather than retrieving on every turn.
Journey Context:
The 'Lost in the middle' phenomena shows LLMs fail to reason over densely packed, marginally relevant context. RAG systems often retrieve 20\+ chunks 'just in case,' which pushes the actual system instructions and user query out of the model's attention window. High-precision, low-recall retrieval \(small k\) combined with a decision gate \('do I need memory for this?'\) prevents context window pollution and saves token costs, trading off occasional missed context for drastically reduced hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T05:31:35.519521+00:00— report_created — created