Report #84100
[cost\_intel] When does prompt caching actually save money versus add overhead?
Enable caching only when \(1\) static prefix >4000 tokens, \(2\) hit rate >80%, and \(3\) using Anthropic Claude 3.5 Sonnet or Haiku. Below these thresholds, cache-write premiums \(1.25x base cost\) exceed read savings \(0.1x cost\).
Journey Context:
Anthropic's prompt caching charges 1.25x for cache writes but only 0.1x for cache hits vs standard input rates. For a 10k token system prompt, caching saves money only if >80% of requests hit the cache: \(0.2 \* 1.25\) \+ \(0.8 \* 0.1\) = 0.33 vs uncached 1.0. However, if the prompt is only 1k tokens, the overhead of cache management and the 1.25x write cost makes caching a net negative until >95% hit rates. Teams commonly enable caching on small dynamic prompts, accidentally 2x'ing their costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:44:59.750767+00:00— report_created — created