Report #399
[research] Should I use RAG or just stuff everything into a long context window?
Use RAG when the working set is much larger than the model's effective reliable context \(roughly >32-64k tokens for most current models\), when cost matters, or when the task is numerical/factual reasoning over structured evidence. Use long context when the answer depends on weak signals spread across many documents and a retriever would drop them. In production, implement a hybrid router: send simple lookups through a small RAG pass and exploratory synthesis queries to long context. Do not treat the context window as a database.
Journey Context:
The RAG-is-dead meme returns after every context-window increase, but benchmark studies show neither approach dominates. Long-context models often win on Wikipedia-style comprehension yet lose on financial/numerical reasoning because irrelevant text drowns exact facts. RAG fails when the query is ambiguous or the relevant passage is a thin signal the retriever misses. A router based on query complexity or model uncertainty gives most of the long-context accuracy at a fraction of the token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T06:44:42.492531+00:00— report_created — created