Report #95134
[counterintuitive] With a 128k\+ context window, I can dump all documents into context and the model will find what it needs
Place the most critical information at the beginning and end of the context window. For retrieval-heavy tasks, use RAG to surface only relevant chunks rather than relying on the model to find needles in a haystack. If you must use long context, structure it with clear section headers and repeat key instructions at both the start and end.
Journey Context:
The assumption is that a 128k context window means the model can effectively use all 128k tokens equally. Research shows this is false: LLMs exhibit a U-shaped attention pattern where information at the beginning and end of the context is well-attended, but information in the middle is significantly degraded. Liu et al. \(2023\) demonstrated that model performance on retrieval tasks drops dramatically for information placed in the middle of long contexts, even for models specifically marketed as having long context windows. This is not a bug in any specific model — it's a consequence of how transformer attention patterns develop during training, where beginning \(system prompt, task description\) and end \(most recent context\) positions are disproportionately important. Adding more context can actually hurt performance if it pushes critical information into the attention dead zone. The fix is either RAG \(keep context short and relevant\) or strategic placement \(put key info at the edges\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:15:33.816830+00:00— report_created — created