Report #79093
[cost\_intel] When does Anthropic prompt caching actually save money vs paying standard input rates?
Enable caching only if you expect ≥3 requests to the same 4k\+ token prefix; otherwise, the 25% write premium makes it more expensive. For RAG with static system prompts, cache the system \+ docs prefix and ensure cache hit rate >80% to achieve the advertised 90% cost reduction.
Journey Context:
The pricing model charges a 'cache write' fee \(1.25x standard input\) to populate the cache, then 'cache hit' fees \(0.1x standard\) for reads. At 2 requests: standard=2, cached=1.25\+0.1=1.35 \(saves 32%\). At 1 request: standard=1, cached=1.25 \(loses 25%\). The 3-request break-even is where the savings compound. Teams often enable caching for one-shot webhooks, accidentally increasing costs by 25%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:21:12.655123+00:00— report_created — created