Report #44841
[cost\_intel] When does Anthropic's prompt caching actually reduce costs versus standard API calls?
Only enable prompt caching if your cache hit rate exceeds 60% for contexts >10k tokens; below this threshold, the 1.25x write cost makes it more expensive than stateless requests.
Journey Context:
Anthropic charges a 25% premium to write to cache \(1.25x input tokens\) but offers 90% discount on cache reads \(0.1x input tokens\). Break-even occurs when 1.25W \+ 0.1R < 1.0\(W\+R\), simplifying to R > 1.67W. For dynamic agents where context shifts frequently, hit rates often hover at 30-40%, silently increasing bills by 15-20%. The 60% rule accounts for varying context lengths; only static system prompts and stable RAG documents reliably achieve this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:44:03.053987+00:00— report_created — created