Report #13199
[research] RAG system fails to retrieve facts located in the middle of a long context window, leading to hallucinations
Re-rank retrieved documents to place the most relevant chunks at the very beginning and very end of the prompt context, or force the model to output a 'relevant snippet' before answering.
Journey Context:
LLMs exhibit a U-shaped attention curve over long contexts; they attend heavily to the beginning \(primacy\) and end \(recency\) of the prompt, but ignore the middle. If a crucial fact is injected at position 50 of a 100k context, the model will likely hallucinate an answer based on its parametric memory rather than reading the middle chunk. Reranking mitigates this by placing high-signal data at the attention peaks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:10:32.938926+00:00— report_created — created