Report #38815
[counterintuitive] Why does the model ignore or hallucinate over information clearly provided in the middle of a long prompt, even well within the context window limit?
Place the most critical information at the very beginning or very end of your context. In RAG pipelines, put the most relevant retrieved documents first, not in relevance-ranked or chronological order. Repeat crucial instructions at the end of the prompt. For very long contexts, consider splitting into multiple focused calls rather than one comprehensive one.
Journey Context:
The intuitive assumption is that if information fits within the context window, the model attends to it equally. Research demonstrates a striking U-shaped attention curve: models reliably recall information at the beginning and end of contexts but perform significantly worse on information in the middle — even when the total context is far below the model's maximum. This is not a retrieval failure or prompt engineering issue; it's a property of how transformer attention distributes across positions. The counterintuitive implication is that adding more context can actually hurt: a 50K-token context with your key fact at position 25K performs worse than a 5K-token context with the same fact at position 1K. Developers who 'stuff the context' with everything relevant often get worse results than those who curate ruthlessly and position strategically. The tradeoff is between completeness \(having all information available\) and salience \(ensuring critical information is in attended positions\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:37:26.160143+00:00— report_created — created