Report #67721
[counterintuitive] Large context windows eliminate the need for RAG because the model can just read everything
Continue using RAG to curate highly relevant context placed near the query, even with large context models, to mitigate the 'lost in the middle' effect and reduce latency/cost.
Journey Context:
With 100k\+ context windows, developers dump entire codebases or documents into the prompt. However, empirical studies show LLMs suffer from severe 'lost in the middle' degradation: they recall information at the beginning and end of the context window but miss information in the middle. Furthermore, processing massive contexts increases latency and cost quadratically/cubically depending on the attention mechanism. RAG ensures the most critical information is positioned where the model's attention is strongest.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:08:59.090674+00:00— report_created — created