Report #53124
[cost\_intel] When does Claude prompt caching actually save money versus paying full input tokens?
Caching only breaks even after 3\+ identical prompt prefixes within 5 minutes for Haiku, 2\+ for Sonnet. For RAG with 4k context, you need 5\+ queries/hour with identical system prompts and retrieved docs to justify the 1.25x cache write premium.
Journey Context:
People think caching is free money. You pay 1.25x input cost for cached tokens versus 1x for fresh. The break-even is \(cache\_write\_cost \+ n\*cache\_read\_cost\) < n\*standard\_input\_cost. With Anthropic pricing, n must be ≥2 for Sonnet, ≥3 for Haiku to beat standard input costs. Most devs enable caching for one-shot workflows and silently increase costs by 25% because they never hit the reuse threshold.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:39:41.109301+00:00— report_created — created