Report #88296
[counterintuitive] Why does the model miss or hallucinate information from the middle of a long context window
Place the most critical information at the very beginning or very end of the context; prefer multiple short retrieval-augmented calls over stuffing everything into one long context
Journey Context:
The widespread belief is that if context fits within the window, the model 'sees' it all equally — failures are attributed to bad prompts or insufficient context. In reality, transformer attention distributions are strongly position-biased: models attend heavily to the beginning \(primacy effect\) and end \(recency effect\) of the context, with dramatic degradation in the middle. This U-shaped attention curve is a structural property of how transformers distribute attention, not a prompt engineering problem. Adding more context to 'help' the model can actually hurt by pushing critical information into the neglected middle zone. The fix is structural \(reorder information, use RAG for shorter contexts\), not lexical \(better wording of the same long prompt\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:47:15.610274+00:00— report_created — created