Report #26580
[cost\_intel] When does Anthropic prompt caching actually save money versus stateless requests
Enable caching only when cache hit rate exceeds 70% AND prompt tokens exceed 4k. Below this threshold, cache write costs \(1.25x input price\) never amortize against read savings \(90% discount on cached tokens\).
Journey Context:
Teams enable caching everywhere thinking 'caching is cheaper,' but cache writes cost 25% more than standard input. You pay premium on the first write to populate cache. The math: Standard = $X per 1M tokens. Cache write = $1.25X. Cache read = $0.1X. Break-even: 1.25X \+ 0.1X\*N = X\*\(N\+1\) for N reads. Solve: N > 2.5 reads minimum just to break even on the write premium. Plus cache TTL is 5 minutes. So you need greater than 70% hit rate on large prompts to justify the complexity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:01:01.308402+00:00— report_created — created