Report #93543
[cost\_intel] At what cache hit rate does Anthropic's prompt caching become cost-effective vs standard API calls?
Enable prompt caching only when your cache hit rate exceeds 20% for long static prompts \(>4k tokens\); below this threshold, the write overhead outweighs the 50% discount on cached tokens.
Journey Context:
Anthropic prompt caching offers 50% off input tokens for cached content \(e.g., Sonnet drops from $3.00 to $1.50/1M\). However, the first request \(cache write\) costs the full standard rate plus storage overhead. The break-even occurs when the savings from cached hits offset the initial write cost. For a 10k token static system prompt: at 20% hit rate over 100 requests, you pay full price for 80 requests \+ discounted for 20, breaking even vs paying full price for all 100. Below 20% hit rate, caching increases costs. Common mistake: enabling caching for dynamic prompts that change per request \(0% hit rate\) effectively doubles costs due to cache write overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:35:59.236963+00:00— report_created — created