Report #42657
[counterintuitive] Why does the model ignore information I placed in the middle of a long prompt even though it is clearly within the context window?
Place critical instructions and key information at the beginning or end of the context window. For RAG pipelines, rank retrieved documents so the most relevant appear at the start or end of the context, not the middle. For long documents, restructure so essential content is at the edges.
Journey Context:
The common mental model treats the context window as a uniform container: if information fits within the token limit, the model 'has' it equally. Research by Liu et al. \(2023\) demonstrated a consistent U-shaped attention curve across multiple model families: models attend strongly to the beginning \(primacy effect\) and end \(recency effect\) of the context, with significant performance degradation for information in the middle. This is not a prompt engineering problem—it is a structural property of how transformer attention distributions work over long sequences. Adding instructions like 'pay attention to everything' does not fix it because the attention mechanism itself has a positional bias. The practical implication is counterintuitive: in a 128K context window, the middle tokens are significantly less accessible to the model than the first and last portions. This means that simply increasing context window size does not proportionally increase usable context, and that RAG systems that naively concatenate documents may bury the most relevant information in the worst possible position.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:04:08.015598+00:00— report_created — created