Report #26395
[cost\_intel] When does prompt caching ROI turn positive for multi-turn coding agents?
Enable prompt caching only when context window exceeds 8k tokens AND the same prefix \(system prompt \+ file context\) is reused across 3\+ consecutive turns; otherwise, cache write costs \($3.75/1M for Claude 3.5 Sonnet\) dominate over base input costs.
Journey Context:
Developers enable caching everywhere assuming it saves money. The trap: cache writes cost 25% premium over base input tokens, and cache hits only save 90% on read costs. Break-even math for Claude 3.5 Sonnet: Caching saves money only after the 3rd reuse of a 10k token context. For Haiku, the threshold is higher \(5\+ reuses\) because base cost is already low. Common failure mode: caching ephemeral tool outputs that change every turn, causing constant cache misses and expensive re-writes. The correct pattern is to cache the static prefix \(system instructions \+ repository context\) while excluding dynamic tool results from the cacheable block.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:42:10.424682+00:00— report_created — created