Report #60676
[counterintuitive] LLM cannot reliably retrieve information from the middle of a long context window
Place critical information at the beginning or end of the context window; for retrieval-heavy tasks, use RAG to reduce context length rather than stuffing everything in; never assume uniform retrieval quality across a long context
Journey Context:
The assumption is that a model advertising a 128k or 200k token context window has uniform retrieval quality across that entire window — that if the model 'can hold' the context, it 'can use' the context equally well at any position. Liu et al. \(2023\) demonstrated a U-shaped performance curve: models retrieve information from the beginning and end of long contexts well, but performance degrades significantly for information in the middle. This occurs even for models explicitly trained on long contexts. It reflects how attention mechanisms distribute computational capacity across positions — attention patterns learned during training create positional biases. Adding more context can actually hurt retrieval of specific items due to attention dilution. The fix is structural: reduce what's in context via RAG, and position what matters most at the edges of the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:19:49.342655+00:00— report_created — created