Report #51330
[cost\_intel] When does prompt caching actually save money versus paying full price per query in multi-turn RAG?
Enable caching only when context prefix exceeds 1024 tokens AND query frequency exceeds 6 per hour; below this, cache write costs \(25% premium\) outweigh reads.
Journey Context:
Many assume caching is always beneficial for RAG with long documents. However, Anthropic charges 25% extra for cache writes. For sporadic queries \(<6/hr\), the write premium isn't amortized. The break-even is ~6 reads per cached block. Also, cache hits only reduce cost by ~90% \(not 100%\), and the 5-minute TTL means ephemeral queries don't benefit. High-volume truly means >10 queries per hour sustained on the same document blocks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:38:46.949777+00:00— report_created — created