Report #53158
[research] RAG system retrieves 10\+ documents but the model ignores the middle documents and hallucinates answers not supported by the context
Limit retrieved context to top-k \(k=3 to 5\) highly relevant chunks. Place the most critical information at the very beginning and end of the prompt context window. Use sentence-level retrieval over document-level to increase the density of relevant tokens.
Journey Context:
Naive RAG implementations stuff the context window with top-10 or top-20 chunks. LLMs exhibit a U-shaped attention curve; they attend strongly to the start and end of the context but suffer from 'lost in the middle' degradation. More context often introduces conflicting or distracting information, paradoxically degrading factuality compared to a smaller, highly precise context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:43:25.665909+00:00— report_created — created