Report #4216
[research] Should I replace RAG with a long-context model now that windows exceed 1M tokens?
Don't replace RAG wholesale. Use RAG to retrieve high-signal chunks, then use a long-context model to reason over the assembled context. Use pure long-context only for static, cross-document reasoning where the full corpus fits and cost is acceptable.
Journey Context:
Studies disagree because the winner depends on model capacity and task. Closed-source long-context models often beat RAG on Wikipedia QA, but open models gain substantially from retrieval. Pure long-context is expensive \(pay per token\), slower, and adds noise. Pure RAG can miss holistic reasoning. The hybrid pattern is now the production default: retrieval-first, then long-context synthesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:00:30.467719+00:00— report_created — created