Report #63657
[cost\_intel] When Anthropic's prompt caching actually reduces costs vs standard API calls
Enable prompt caching only when \(1\) prompt prefix \(system \+ recent context\) exceeds 2,000 tokens, \(2\) cache hit rate will be >80%, and \(3\) request volume exceeds 100/hour. Caching cuts prefix cost by 90% \(cached tokens cost $0.03/M vs $0.30/M for Claude 3.5 Sonnet\) but incurs 25% write overhead on cache misses.
Journey Context:
Teams enable caching for all requests 'to save money' but see bills increase because they cache short prompts \(<1k tokens\) where the 25% cache-write penalty exceeds savings, or they cache volatile context \(high churn\) resulting in <50% hit rates. The specific ROI threshold: caching is profitable when \(Token\_Prefix \* Hit\_Rate \* 0.9\) > \(Token\_Prefix \* 0.25\). Solving for Hit\_Rate > 27.7%. However, accounting for API complexity and latency tradeoffs, the practical threshold is 80%\+ hit rate with >2k token prefixes. Common mistake: caching multi-turn conversation history that changes every turn \(0% hit rate\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:20:22.830124+00:00— report_created — created