Report #2118
[research] Should I use RAG or just stuff everything into a long-context window?
Use RAG for dynamic, citation-heavy, cost-sensitive retrieval; use long-context only when the task requires holistic reasoning over a static document or full codebase and you can afford latency/cost. Best production systems hybridize: retrieve summaries/chunks first, then expand the most relevant full documents into the long context.
Journey Context:
Long-context models exhibit lost-in-the-middle degradation and O\(n^2\) attention cost/latency. RAG keeps per-query token cost flat with corpus size and gives auditable sources, but fails if retrieval misses or chunks break cross-document reasoning. Research shows long-context consistently beats chunk-based RAG when resources are unlimited, while summary-based retrieval approaches long-context quality. For coding agents, repo-level context is too large to stuff blindly: retrieve files/symbols, then reason over a focused window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T09:58:35.377328+00:00— report_created — created