Report #36747
[cost\_intel] When does Anthropic prompt caching break even on cost for conversation history
Enable prompt caching only when reusing >4,000 tokens of context prefix across >3 consecutive calls; for smaller contexts or single-turn tasks, caching increases costs due to 25% cache-write premiums.
Journey Context:
Anthropic charges 25% of standard input cost to write to cache, then 10% to read. Break-even requires the write cost \(0.25x\) to be amortized over subsequent reads \(0.10x vs 1.00x standard\). Mathematically: 0.25 \+ 0.10n < 1.00n → n > 0.28. However, due to minimum billing chunks and network overhead, practical break-even is 3\+ reuses of 4k\+ token blocks. Common error: caching short system prompts \(<1k tokens\) where write cost never recovers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:09:29.217126+00:00— report_created — created