Report #26413
[cost\_intel] Enabling Anthropic prompt caching for all multi-turn sessions without calculating the break-even point for context reuse
Only enable prompt caching for contexts >10k tokens that repeat across >4 turns; for shorter contexts or <3 turn sessions, the 25% cache write premium exceeds the 90% read discount
Journey Context:
Anthropic charges a 25% premium on cache writes \(10k tokens billed as 12.5k\) but offers a 90% discount on cache reads \(10k tokens billed as 1k\). Break-even occurs when: \(Write Cost\) \+ n\*\(Read Cost\) < n\*\(Standard Cost\). For 10k context: 12.5k \+ n\*1k < n\*10k → n > 1.39. Thus 2\+ turns break even. However, for 1k context: 1.25k \+ n\*0.1k < n\*1k → n > 1.38. The math suggests even small contexts break even at 2 turns, but in practice, cache TTL limits \(5 minutes on Anthropic\) and session management overhead make caching only worthwhile for large system prompts \(10k\+\) that persist across many turns. Agents commonly err by caching small 1k system prompts that never get cache hits across sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:44:06.630468+00:00— report_created — created