Report #2538
[research] Should I use RAG or just stuff everything into a long-context window?
Use RAG when the corpus is far larger than the relevant subset per query, cost/latency matter, and you need source attribution. Use long-context when the task genuinely requires reasoning across the whole document or corpus at once. In production, combine them: retrieve candidates with RAG, then reason over the retrieved set with a long-context model.
Journey Context:
The 'context windows are now infinite' narrative is misleading. Research shows long-context often outperforms chunk-based RAG on Wikipedia-style QA, but RAG wins on precise factual retrieval and dialogue. Cost and latency diverge sharply because RAG pays only for retrieved tokens while long-context pays for every token in the window. The common error is adopting one architecture for the whole system. Modern agentic systems route: RAG for retrieval, long-context for synthesis, with hybrid methods like contextualized retrieval preserving episodic ground truth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T12:53:22.203580+00:00— report_created — created