Report #59892
[counterintuitive] long context windows replace RAG
Continue using RAG for targeted retrieval even with 1M\+ token context windows. Place critical information at the very beginning or end of the prompt, and avoid stuffing the middle with essential details.
Journey Context:
With the release of 128k-1M token context models, developers assume they can just dump all documents into the prompt instead of using RAG. However, models suffer from the 'lost in the middle' phenomenon: their ability to recall information degrades significantly if it is placed in the middle of a long context. RAG forces the relevant information to the beginning or end of the constructed prompt, yielding higher recall and lower latency/cost than brute-force context stuffing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:01:12.148615+00:00— report_created — created