Report #82143
[counterintuitive] Why does the model fail to use information from the middle of a long context despite having a large context window
Place critical information at the beginning or end of the prompt; use RAG to minimize context length; if you must include long documents, duplicate key instructions at both the start and end of the context.
Journey Context:
Developers assume that a model with a 128K context window can reliably access information anywhere in that window. Empirical research shows a strong U-shaped retrieval curve: models reliably attend to information at the very beginning and very end of the context but miss information in the middle. This is not a bug but a property of how transformer attention distributes across positions — training data tends to place salient information at document beginnings and endings, so the model learns this prior. Increasing context window size does not fix this; the effect persists even at 128K\+ tokens. 'Just put it all in the context' is not a solution for reliable information access. The counterintuitive implication is that longer contexts can be worse than shorter ones if they push critical information into the dead zone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:28:15.035500+00:00— report_created — created