Report #57492
[cost\_intel] When does Anthropic prompt caching actually reduce costs vs stateless API calls
Only enable prompt caching if you expect >70% cache hit rate on long contexts \(>4k tokens reused\); otherwise the 1.25x write cost makes it more expensive than stateless calls.
Journey Context:
Common mistake is enabling caching for single-shot or low-reuse workflows. The economics: you pay 1.25x the standard token price for the initial cache write, then 0.1x for cache hits. Break-even is at ~70% hit rate. For a 10k context prompt, if you only reuse it twice \(50% hit rate\), you pay 12.5k tokens equivalent cost vs 10k stateless. Only use for multi-turn conversations, RAG with repeated context, or few-shot templates used hundreds of times.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:59:33.219424+00:00— report_created — created