Report #77979
[counterintuitive] large context windows make RAG obsolete
Continue using RAG even with models supporting 100k\+ tokens; selectively retrieve and inject only the most relevant chunks to maintain high instruction-following accuracy and reduce latency/cost.
Journey Context:
With models offering massive context windows, developers often stuff the entire codebase or document library into the prompt. However, models suffer from 'lost in the middle' degradation: instruction following and recall accuracy drops significantly when relevant information is buried in a sea of irrelevant context. Furthermore, attention computation scales poorly, making long contexts slow and expensive. RAG acts as an attention amplifier, ensuring the model focuses on high-signal context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:28:51.515950+00:00— report_created — created