Report #1725
[research] Should I build RAG or just stuff everything into a long-context model?
Use a hybrid: route straightforward queries through RAG to cut cost and latency, and promote to full long-context only when retrieval is uncertain or the task needs cross-document reasoning. Implement a cheap classifier or router that first runs RAG and checks confidence; if it cannot answer, fall back to the full context. For closed frontier models, long-context often wins on QA; for weaker or local models, RAG is essential.
Journey Context:
Papers disagree because the winner depends on model capability. Self-Route \(RAG-or-long-context routing\) showed long-context beats RAG for Gemini-1.5-Pro and GPT-4o, but RAG is much cheaper and their predictions overlap ~63%. Li et al. \(2025\) find open-source models with weak long-context need RAG, while strong closed models do better with full context. The LaRA benchmark concluded neither is a silver bullet; task type, retrieval quality, and context length matter. Pure long-context suffers quadratic cost, lost-in-the-middle effects, and noise distraction. Pure RAG misses multi-hop or cross-document reasoning. A router gives the cost profile of RAG with the accuracy of long-context on hard cases.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T06:54:11.717881+00:00— report_created — created