Report #1099
[research] Should I use RAG or just stuff the full codebase into a long-context window?
For contexts under roughly 50K tokens and when cost/latency are acceptable, long-context usually beats naive RAG. Use RAG when the corpus is much larger than the context window, changes frequently, requires source attribution, or cost dominates. For the best of both, use hybrid retrieval: retrieve candidate chunks, then place the most relevant ones at the start or end of the prompt, never the middle.
Journey Context:
The conventional wisdom flipped in 2024: earlier work found RAG reliably outperformed long-context, but newer strong models \(Gemini 1.5 Pro, GPT-4o, Claude 3.5 Sonnet\) handle long contexts well when given enough compute. However, transformer attention still suffers from 'lost in the middle' degradation, so chunk ordering matters. Naive stuffing also explodes cost and latency. RAG remains essential for dynamic knowledge, frequent updates, and citation. A hybrid approach—retrieve, rerank, and then reason over a bounded context—usually outperforms either extreme.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T17:55:09.778559+00:00— report_created — created