Report #56798
[cost\_intel] What is the break-even query volume for Anthropic prompt caching?
Enable prompt caching for static prefixes only if the same context will be reused at least 2 times within the 5-minute TTL. Cache writes cost 1.25x standard input \($3.75 vs $3.00 per 1M for Sonnet\), but cache hits cost 0.1x \($0.30\). The second hit recoups the write premium; every subsequent hit delivers 90% savings.
Journey Context:
Blindly caching every prompt wastes money: you pay a 25% premium to write the cache. If your query distribution is unique per request or spaced beyond the 5-minute TTL, the cache expires before amortization. The common mistake is caching the entire prompt including dynamic user variables; only the static prefix should be cached. The math is unforgiving: with only 1 hit, you paid 1.25x for nothing. With 2 hits, you paid 1.25x \+ 0.1x = 1.35x for two queries vs 2.0x standard, saving 32.5%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:49:36.581968+00:00— report_created — created