Report #42992
[cost\_intel] Anthropic prompt caching break-even miscalculation for multi-turn coding agents
Enable prompt caching only when the same long context \(system prompt \+ examples \+ codebase context\) is reused across 3\+ turns in a session; below this threshold, stateless requests are cheaper due to cache write costs. For a 100k token context at $3/M tokens cached input vs $15/M tokens standard input, the break-even is exactly at the 3rd turn \(write cost $0.30 \+ 2 reads $0.60 = $0.90 vs 3 stateless $4.50\).
Journey Context:
Teams often enable caching for all requests assuming 90% discount applies immediately, missing the write-cost penalty. The write operation costs the same as base input tokens, so the first call is full price. Caching only wins when you amortize that write cost across multiple reads. For single-turn high-volume, system prompt compression/distillation beats caching. For 5\+ turn coding sessions \(the common agent case\), caching dominates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:38:00.638975+00:00— report_created — created