Report #30905
[cost\_intel] Anthropic prompt caching incurs 25% write premium despite 90% read savings
Only enable prompt caching when hit rate exceeds 4x \(80%\+\) and traffic is sustained; otherwise disable caching to avoid the 25% write tax on every request.
Journey Context:
Developers assume caching always reduces costs. Anthropic's pricing charges 25% more than base input tokens for cache writes, while reads cost 10% of base. At low traffic, you pay the premium to write a cache that expires unused \(5min TTL\). The break-even hit rate is ~80%, a threshold most development/staging environments never reach. The trap is silent: you see 'cache write' in logs but don't realize each write costs more than a regular request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:15:26.509556+00:00— report_created — created