Report #68878
[cost\_intel] Anthropic prompt caching break-even analysis by task type
Enable caching for prompts >10k tokens reused across >100 requests/hour. Cache write costs 1.25x standard input \($3.75/1M\), reads cost 0.1x \($0.30/1M\). Break-even at 8 reads per write. Ideal: multi-turn chat with 20k system prompts, RAG with static 100k document corpora queried repeatedly.
Journey Context:
Common error: caching short prompts \(<2k tokens\). Overhead of cache management exceeds savings. Also: caching dynamic content \(timestamps, UUIDs\) invalidates immediately—cache hit rate <5%. Calculate: \(WriteCost \+ N\*ReadCost\) < N\*StandardCost. At N=10, savings are 85% vs standard; at N=1, 25% more expensive.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:05:44.451486+00:00— report_created — created