Report #92833
[counterintuitive] Why LLMs fail to retrieve information from the middle of a long context document
Put the most critical instructions and retrieval targets at the very beginning or very end of the prompt; use RAG to shorten context rather than dumping entire documents into the context window.
Journey Context:
The community often assumes a 100k\+ context window means the model can perfectly attend to everything within it. However, transformer attention mechanisms suffer from 'attention dilution'. Research shows a U-shaped performance curve: models easily recall items at the start and end of a context but miss items in the middle. This is an architectural artifact of how attention weights are distributed, not a prompt engineering failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:24:30.173807+00:00— report_created — created