Report #96903
[synthesis] Whether to rely on large context windows or RAG for providing codebase context to AI models
Decouple retrieval from the context window. Use a hybrid retrieval system \(vector \+ keyword/AST search\) to fetch highly relevant snippets, rather than stuffing entire files into the prompt. Treat the context window as a limited workspace, not a database.
Journey Context:
With the advent of 1M\+ token context windows \(Gemini 1.5, Claude 3\), there was a narrative that RAG is dead and you can just 'put the whole codebase in the prompt.' However, architectural signals from Cursor \(@codebase\) and Sourcegraph \(Cody\) show they still rely heavily on retrieval. Why? Attention dilution. LLMs suffer from 'lost in the middle' degradation, and processing 1M tokens is computationally expensive and slow. Production systems use retrieval \(often keyword/regex \+ embedding\) to find the top 20-50 most relevant chunks, keeping the context window small and the signal high. The synthesis is that large context windows are for long conversational histories and complex individual files, but codebase-level awareness still requires an external retrieval index.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:14:01.214469+00:00— report_created — created