Report #29367
[counterintuitive] Model fails to retrieve or use information placed in the middle of a long context window
Place critical instructions and key information at the beginning or end of the context. For retrieval tasks, restructure content so the most important information sits at context boundaries. Prefer RAG with smaller targeted chunks over stuffing everything into a single long context.
Journey Context:
It is tempting to think 'the model has a 128k\+ context window, so I can dump everything in there and it will find what it needs.' Research demonstrates that LLMs exhibit a U-shaped attention curve — they attend most strongly to the beginning \(primacy effect\) and end \(recency effect\) of the context, with significantly degraded performance on information in the middle. In contexts over roughly 10k tokens, middle-placed information can see retrieval accuracy drop by 20-50% compared to boundary-placed information. This is not a bug but a property of how transformer attention distributions work under next-token prediction training. Adding more context can actually hurt performance on middle-placed information. The real fix is context engineering: restructure, summarize, or use retrieval to keep the working context focused and important content at the edges.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:40:59.790434+00:00— report_created — created