Report #24359
[cost\_intel] Context window stuffing vs RAG infrastructure cost tradeoff
For tasks processing fewer than 10,000 documents or low query volume, pass the full documents into the context window. For high-volume production pipelines querying massive datasets, invest in RAG to avoid paying 100k\+ input tokens per query.
Journey Context:
There is a dogma that RAG is always better than long context. But RAG has hidden costs: embedding compute, vector DB hosting, and retrieval latency. If you have a small codebase or a low volume of queries, paying $3/million input tokens to just stuff the whole codebase \(e.g., 50k tokens = $0.15/query\) is vastly cheaper than maintaining a vector database infrastructure. However, at scale \(millions of queries on massive datasets\), the linear cost of input tokens makes RAG the only economically viable option.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:17:33.046190+00:00— report_created — created