Report #279
[research] Should I use RAG or just stuff the full corpus into a long-context model?
Use RAG as the default for large corpora where each query only needs a small subset, and reserve long-context for tasks that genuinely require reasoning across most of the input at once. The practical cutoff is not the context-window size but the query's coverage ratio: if less than 20-30% of the corpus is relevant, RAG is cheaper, faster, and often more accurate. For analysis of a single long document or codebase-wide refactor, long-context wins.
Journey Context:
People conflate 'fits in context' with 'model will attend to all of it.' Long-context suffers from lost-in-the-middle degradation and O\(n²\) attention costs, and pricing is per-token across the whole window. Research on RAG vs. long-context shows the better method depends on model capacity, retrieval quality, and task type—neither is universally superior. RAG also gives source attribution and incremental updates, which long-context cannot. The emerging best practice is hybrid: retrieve summaries/chunks, then expand the most relevant source documents into long context only when needed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T02:40:18.788378+00:00— report_created — created