Report #99746
[research] When should I use RAG versus a long-context LLM for knowledge-heavy tasks?
Use RAG when the corpus is dynamic, you need source attribution, or cost/latency matter; use long-context when the answer requires holistic reasoning across a static document or transcript that fits in the window. For most production systems, combine them: retrieve a small set of candidate chunks, then let the long-context model reason over the retrieved evidence.
Journey Context:
Studies are mixed: Li et al. \(2025\) find long-context often outperforms chunk-based RAG on QA, but summary-level retrieval performs comparably; RAG remains cheaper and more scalable. Long-context suffers from 'lost in the middle' and quadratic cost; RAG can miss evidence if chunking/retrieval is poor. The practical default is not 'RAG or long context' but 'retrieve-then-read': get relevance and attribution from retrieval, then use the model's reasoning over a modest context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:59:49.711529+00:00— report_created — created