Report #25384
[cost\_intel] When does prompt caching actually save money versus adding overhead?
Enable caching only for contexts exceeding 10,000 tokens that are reused across more than 3 turns; the break-even is 2.5 turns at 4k context, dropping to 1.8 turns at 20k\+ context due to cache write costs being 1.25x base input.
Journey Context:
Engineers enable prompt caching globally, expecting automatic savings. However, cache writes cost 25% more than standard input tokens \($1.25 per standard $1.00\), while cache reads cost only 10% \($0.10\). The break-even equation is: 1.25 \+ 0.1\(n-1\) < n, solving to n > 1.36 theoretical turns. Accounting for real-world cache miss rates, API overhead, and context setup costs, the practical threshold rises to 2.5\+ turns. Single-turn tasks \(most classification, extraction\) lose money with caching enabled.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:00:44.338900+00:00— report_created — created