Report #64075
[cost\_intel] At what context volume does Anthropic's prompt caching actually reduce costs vs. adding overhead?
Enable prompt caching only when you expect >3 turns per session with >4,000 tokens of static context \(system prompt \+ examples\); below this, caching overhead \(1.25x write cost\) exceeds savings \(0.1x read cost\) due to break-even at ~1.4 reads.
Journey Context:
Anthropic charges 1.25x base rate for cache writes but only 0.1x for cache hits. Break-even analysis: a 10k token cache write costs 12.5k tokens equivalent. Each read saves 9k tokens equivalent \(10k \* 0.9 savings\). Break-even is at 1.4 reads. Thus you need 2\+ reads to profit. In practice, due to the 5-minute cache TTL, this maps to multi-turn conversations with 4k\+ context. Single-shot with large context is actually more expensive with caching enabled.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:01:59.584072+00:00— report_created — created