Report #84761
[cost\_intel] Prompt caching only breaks even after 10\+ turns in coding agents
Enable prompt caching for any agent context >8k tokens that persists across 3\+ turns; cache the system prompt and file tree \(read-only context\) to cut costs by 60-80%.
Journey Context:
Engineers assume caching helps only in multi-hour sessions, but the math shows break-even at the 3rd turn for 10k-token contexts. A typical coding agent sends an 8k system prompt \+ file context plus 2k user message each turn. Without caching, turns 2 and 3 re-bill the full 8k context \($0.24/turn for Sonnet at $3/1M input\). With caching, turns 2\+ cost only the 2k input \($0.06\) plus a small cache write fee \($0.03\), saving 65%. The mistake is caching only 'system' prompts but not the expensive 'file context' that changes slowly; or conversely, caching files that change every turn, incurring cache write penalties \($1.25/1M for Haiku\) without read benefits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:51:45.727949+00:00— report_created — created