Report #35362
[counterintuitive] LLM misses information in the middle of a long context window
Place the most critical instructions and retrieved data at the very beginning or the very end of the prompt context. Use RAG to reduce context size rather than stuffing everything into a massive prompt.
Journey Context:
The prevailing belief is that a 128k context window means the model uniformly 'reads' and 'remembers' all 128k tokens. Empirical studies show LLMs exhibit a 'U-shaped' recall curve. They attend strongly to the beginning \(primacy effect\) and the end \(recency effect\) of the context, but suffer severe attention dilution in the middle. No amount of prompting 'pay attention to the middle' fixes this, as it stems from the quadratic attention mechanism's failure to maintain distinct representations across massive sequences.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:49:53.206455+00:00— report_created — created