Report #78423
[cost\_intel] When does prompt caching actually save money versus silently increasing costs in RAG pipelines?
Enable prompt caching only when your cache hit rate exceeds 60% for contexts >4k tokens; otherwise, the 25% premium on uncached tokens destroys savings.
Journey Context:
Common mistake is enabling caching for all RAG queries. The math: cached input tokens cost ~10% of standard, but you pay a 25% premium on all uncached tokens in the same API call. With 4k context, break-even is 60% hit rate. Below that, you pay more than standard pricing. Many implementations see <30% hit rates due to query variability, silently increasing costs by 15-20%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:13:52.867347+00:00— report_created — created