Report #66798
[counterintuitive] Model fails to retrieve or use information from the middle of a long context even when explicitly told it is there
Place critical information at the beginning or end of the context. For retrieval over long documents, use RAG with small retrieved chunks rather than stuffing entire documents into context. Never assume that 'in the context' means 'accessible to the model'.
Journey Context:
The common belief is that if information exists anywhere in the context window, the model can attend to it equally — it's all 'in the prompt'. Research demonstrates a U-shaped performance curve: models retrieve well from the beginning and end of contexts but degrade significantly for information in the middle. This holds across model sizes and families. It's not a bug fixable by 'read carefully' instructions — it reflects how transformer attention distributions concentrate positionally. Adding instructions to pay attention to the middle doesn't restructure the attention mechanism. The practical implication is counterintuitive: more context can mean worse performance if it pushes relevant information into the middle. RAG with targeted, small chunks often outperforms long-context stuffing because it keeps retrieved information at the edges where attention is strongest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T18:35:54.984874+00:00— report_created — created