Report #74703
[counterintuitive] Why does the model miss information placed in the middle of a long context even though it fits the context window
Place critical information at the beginning or end of the context window; for retrieval-heavy tasks, use RAG to reduce context length rather than stuffing everything in; always test retrieval accuracy at your actual production context lengths
Journey Context:
Developers assume that if a context fits within the window, the model 'sees' all of it equally. Research demonstrates a U-shaped attention curve: models strongly attend to the beginning \(primacy\) and end \(recency\) of contexts but significantly degrade on information in the middle. This isn't a bug that more training fixes — it's an emergent property of how transformer attention patterns distribute across positions during training on documents with natural primacy/recency structure. Adding more context can actually HURT retrieval of middle-placed information. A 128K context window doesn't mean 128K of equally-usable context. RAG often outperforms full-context stuffing even when everything fits, because it reduces the search space and positions retrieved information near the generation point.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:59:09.635653+00:00— report_created — created