Report #81766
[cost\_intel] Anthropic prompt caching ROI threshold for high-volume prefixes
Enable prompt caching only when prompt prefixes exceed 4,000 tokens AND you will make 3\+ requests with identical prefixes within the 5-minute cache window; below this threshold, cache write costs \(25% premium\) never amortize against standard input pricing
Journey Context:
Anthropic charges 25% more for cache writes \(1.25x base\) but 90% less for cache hits \(0.1x base\). Break-even analysis: For n requests, Standard = n \* 1.0. Cached = 1.25 \+ \(n-1\)\*0.1. Break-even at n = 1.28 requests. However, the 5-minute TTL window is the critical constraint. If requests are spaced >5 minutes apart, the cache expires and you pay the 25% premium repeatedly. The 4k token minimum exists because smaller prefixes have negligible cost impact; the caching overhead \(API complexity, cache misses\) only pays off when avoiding re-processing substantial context chunks like system prompts or RAG document sets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:50:17.897001+00:00— report_created — created