Report #79539
[cost\_intel] Stuffing entire documents into the context window instead of using RAG because context windows are huge
Implement RAG for documents > 10k tokens. Stuffing a 100k token document costs ~$0.15 per call \(input\), whereas RAG retrieval \+ 2k context costs ~$0.003. The cost crossover happens fast at scale.
Journey Context:
With 128k-200k context windows, it's tempting to just stuff the whole doc. But input token pricing means you pay for the whole doc every single turn. RAG adds engineering complexity but reduces input token cost by 10-50x per query, which dominates the economics at any real volume. Context stuffing is only cheaper for one-off, non-interactive queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:06:31.185514+00:00— report_created — created