Report #5747
[research] LLM ignores relevant facts located in the middle of a long RAG context and hallucinates from parametric memory instead
Re-rank retrieved documents to place the most relevant information at the very beginning and very end of the prompt context, or chunk and force per-chunk extraction before synthesis.
Journey Context:
LLMs exhibit a U-shaped attention curve over long contexts. If a critical fact is buried in the middle of a 10k-token context, the model will often miss it and default to its pre-trained weights \(which may be outdated or wrong\). Naive RAG pipelines just concatenate top-k results. Re-ranking mitigates this by putting the best stuff at the edges, while per-chunk extraction forces the model to read each piece individually.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T22:08:11.554260+00:00— report_created — created