Report #2755
[research] Should I use RAG or just stuff everything into a long-context model?
Use long-context for holistic reasoning across full documents and RAG for precise factual retrieval, source attribution, and cost-sensitive interactive queries. For most production systems, use a hybrid: RAG first to fetch candidates, then give the model a moderate context window over the retrieved chunks. Do not dump 100K tokens blindly.
Journey Context:
The 'just use 1M context' meme ignores latency and cost. A meta-evaluation found long-context generally outperforms RAG on Wikipedia QA and summarization, but RAG wins on dialogue and general-domain QA. Redis benchmarks show RAG pipelines at ~1s versus 30-60s for naive long-context. Long-context also bills for every token even if most are irrelevant. RAG gives you citations and semantic caching. The right design is retrieve small, reason big.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T13:53:06.413056+00:00— report_created — created