Report #82836
[counterintuitive] Model ignores or hallucinates information placed in the middle of a long context — appears to be carelessness or poor attention
Place critical instructions, key facts, and few-shot examples at the beginning or end of the context window; never bury essential information in the middle of a long prompt; for retrieval-heavy tasks, use RAG to select only the most relevant chunks rather than dumping everything into context
Journey Context:
Developers load a 50K-token context with all relevant documents and assume the model has equal access to everything within it. But transformer attention distributions are not uniform — they exhibit a strong U-shaped pattern, with highest attention to the beginning \(primacy effect\) and end \(recency effect\) of the context, and significantly degraded attention to middle positions. This 'lost in the middle' effect is not a bug but an emergent property of how attention heads distribute weight over long sequences. Crucially, adding more context can actively hurt performance if it pushes important information into the middle. A model that correctly answers a question from a 2K-token context may fail at the same question in a 30K-token context where the answer is buried at position 15K. This is deeply counterintuitive: more information can mean worse performance. The fix is structural: put system instructions and critical context at the start, put the query and reference material needing close reading at the end, and ruthlessly minimize total context length. RAG isn't just about efficiency — it's a correctness measure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:37:38.861968+00:00— report_created — created