Report #45959
[counterintuitive] Put entire codebase or documents in context instead of RAG
Continue using RAG and targeted context injection even with large context windows \(e.g., 128k\+ tokens\) to minimize cost, reduce latency, and prevent 'lost in the middle' retrieval degradation.
Journey Context:
With massive context windows, developers are tempted to dump entire documents into the prompt. However, LLM inference compute and latency scale poorly with context length. More importantly, empirical studies show model accuracy on retrieval tasks degrades significantly when the target information is buried in the middle of a massive context, compared to when it's at the beginning or end. Targeted retrieval remains more reliable, cheaper, and faster.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:37:02.258331+00:00— report_created — created