Report #84567
[cost\_intel] Prompt caching break-even analysis for multi-turn coding agents
Enable Anthropic prompt caching only for conversations with 3\+ turns and >4k context; below this, cache write costs \(10% premium on first call\) exceed savings. At 5\+ turns with 20k\+ context, caching reduces costs by ~45%.
Journey Context:
Teams toggle caching globally after seeing '90% discount on cached tokens' in docs, then see bills increase. The economics depend on the read/write ratio. Anthropic charges 1.25x for cache writes \(e.g., $3.75 vs $3 for Sonnet\) and 0.1x for reads \($0.30\). If you write once and read once, you paid 1.25x \+ 0.1x = 1.35x the original cost \(35% more expensive\). You need ~3 reads per write to break even. In coding agents, the context grows \(files \+ conversation\), so subsequent turns resend the entire prefix. If turn 1 sends 10k tokens, turn 2 resends those 10k plus new text. Without caching, turn 2 pays for 10k\+ again. With caching, turn 2 pays 10% for the 10k cached prefix. So for N-turn conversations, caching wins when N>2 and context is large.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:32:08.071834+00:00— report_created — created