Report #75996
[cost\_intel] At what query volume does prompt caching break even for long-context RAG?
For RAG with >50k token contexts, caching the context costs ~$0.01-0.02 per document. Break-even is 3\+ queries per document. At 10\+ queries, caching reduces costs by 80% vs. re-sending context.
Journey Context:
Without caching, every RAG query pays the full context window cost. People think 'RAG reduces tokens vs. fine-tuning' but ignore that 100k context × $3/1M tokens = $0.30/query. With caching, you pay a write cost \(approx 1.25x input token cost\) once, then read at 10% of input cost. The mistake is caching per-turn in multi-turn chat; you should cache the static document chunk and append dynamic queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:09:13.332676+00:00— report_created — created