Report #26230
[cost\_intel] Paying for massive context windows when simple RAG suffices
Use RAG to inject only relevant chunks rather than dumping entire documents into the context window. Context window costs scale linearly with input tokens.
Journey Context:
With models supporting 128k-200k tokens, it's tempting to just stuff the whole codebase or document into the prompt. However, you pay for every input token, and models suffer from 'lost in the middle' degradation on long contexts. RAG adds engineering complexity but drastically reduces input token costs and often improves accuracy by focusing the model's attention.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:25:53.932818+00:00— report_created — created