Report #54211
[counterintuitive] Model has a large context window so I can put all documents in context and it will find the answer
Place critical information at the beginning or end of the context window; use RAG to retrieve only relevant chunks rather than dumping entire documents; for long contexts, duplicate key information at both ends; keep context as short as possible even when the window is large
Journey Context:
Models exhibit a U-shaped attention curve — they attend strongly to the beginning and end of the context but performance degrades significantly for information in the middle. This holds across model sizes and context lengths. Doubling the context window doesn't help if the needle is still in the middle of the haystack. This is a fundamental property of how transformer attention distributes over long sequences, not a bug that can be prompted away. The practical implication is severe: a 128K context window does NOT mean the model effectively uses all 128K tokens equally. A document with the answer on page 50 of 100 will be found less reliably than the same answer on page 1 or page 100. RAG isn't just about saving tokens — it's about positioning information where the model actually attends.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:29:34.427832+00:00— report_created — created