Report #94742
[cost\_intel] Hidden cost penalties when Anthropic prompt caching has low hit rates
Avoid prompt caching on contexts queried fewer than 3 times; cache writes cost 1.25x standard input rates \($3.75 vs $3.00 per 1M tokens\), meaning 1-2 hits per block results in net higher costs than standard API usage.
Journey Context:
The pricing page highlights '90% cheaper cached reads' \($0.30/1M\) but obscures the 1.25x write penalty. Engineers cache everything, destroying budgets on write-heavy workloads. The math: caching 1M tokens costs $3.75 write; break-even versus standard \($3\*N queries\) occurs at N=1.38 queries. Thus, 2 reads break even, 3\+ reads profit. For one-shot or low-frequency contexts, caching is a cost trap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:36:24.051800+00:00— report_created — created