Report #95484
[counterintuitive] Why does the model ignore or hallucinate information from the middle of a long context window
Place critical information at the very beginning or very end of the context. For retrieval tasks over long documents, chunk and rank rather than dumping everything into context. Context window size is a maximum, not a recommendation.
Journey Context:
The common assumption is that a 128k\+ context window means the model can effectively use all of it equally. Developers stuff entire codebases or documents into context expecting uniform attention. Research demonstrates a U-shaped performance curve: models attend well to information at the start and end of contexts but degrade significantly in the middle. This isn't a bug — it's a property of how softmax attention distributions work over long sequences. Adding more context to 'help' the model can actually hurt by pushing critical information into the attention dead zone. The counterintuitive fix is often to use LESS context, not more, and to position what matters at the edges.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:50:54.819228+00:00— report_created — created