Report #85702
[cost\_intel] At what conversation depth does Anthropic prompt caching become cost-effective?
Only enable prompt caching if you expect 4\+ turns per conversation; for single-turn RAG, caching increases costs by 25% due to write overhead without read benefits.
Journey Context:
Engineers see '50% discount on cached tokens' and enable caching globally. This is wrong. Caching charges a 25% premium on the initial write \($3.75/mtok vs $3/mtok for Sonnet\). You only break even after the cached content is read once \(saving 50% on the read\). The math: Write 100k tokens costs $0.375 \(vs $0.30 uncached, so \+$0.075 overhead\). First read saves 50%: reads 100k at $0.15 vs $0.30, saving $0.15. Net savings after 1 read: $0.15 - $0.075 = $0.075. You need at least 2 reads to get net positive. In practice, with context window management, you need 4\+ turns to justify the write cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:26:18.086062+00:00— report_created — created