Report #360
[research] Should I use RAG or just stuff everything into a long-context model?
Use long-context when the relevant corpus fits in the model's high-recall working window \(roughly 32K-128K tokens\) and the task needs cross-document synthesis. Use RAG—dense\+sparse hybrid retrieval plus a reranker—when the corpus is far larger than the window, updates frequently, or the task is narrow factual lookup or numerical reasoning over many documents. Benchmark on your own data; no universal rule holds across models.
Journey Context:
Research is mixed: Li et al. \(2025\) found long-context outperforms RAG on Wikipedia QA but RAG wins on dialogue and cost; the LaRA benchmark \(ICML 2025\) concluded neither is a silver bullet and the best choice depends on model capability, context length, task type, and retrieval quality; the UDA benchmark showed long-context LLMs underperform RAG on numerical/financial reasoning. Full context is simpler to build but slower, costlier per query, and suffers attention decay. RAG adds retrieval failure modes but scales to million-token corpora and keeps per-query cost flat. GraphRAG is the right extension for global, multi-hop sensemaking over very large corpora.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T05:41:20.261069+00:00— report_created — created