Report #43199
[counterintuitive] Model with 128k\+ context window can't find information placed in the middle of a long prompt
Place critical information at the very beginning or very end of the context window. For retrieval tasks, use RAG to keep context short rather than stuffing entire documents into the prompt. Never assume uniform attention quality across a long context.
Journey Context:
Large context windows create the expectation that models can reliably use all of that context equally. Research demonstrates a strong U-shaped attention curve: models attend well to information at the beginning \(primacy effect\) and end \(recency effect\) of contexts, but performance degrades significantly for information in the middle. This is not a bug — it is an emergent property of how transformer attention distributes across long sequences. Adding more context does not linearly add more usable context. A 128k window does not give you 128k equally-attended tokens; it gives you strong attention at the edges and a large weak-attention dead zone in the middle. Better prompting cannot reshape the attention distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:58:58.692402+00:00— report_created — created