Report #35457
[cost\_intel] At what conversation length does prompt caching break even on cost?
Enable prompt caching for multi-turn conversations where the static system prompt \+ RAG context exceeds 4,000 tokens. Break-even occurs at the 2nd turn \(mathematically: write cost 1.25x, read cost 0.1x; savings start on turn 2\). For a 10k token static prompt over 10 turns, caching saves 85% on input costs \($0.33 vs $2.20\).
Journey Context:
Teams enable caching 'for long chats' but misunderstand the write penalty. The math is brutal: you pay 25% extra on turn 1 to populate the cache. You only save money on turn 2\+ when you read at 10% cost. Break-even is immediate on the second turn—any conversation with 3\+ turns saves money, but 2-turn conversations lose money. The real win is high-static, low-dynamic scenarios: 50k tokens of legal context with 200-token user questions. Without caching, 10 turns costs 500k tokens; with caching, 62.5k \+ 9×5k = 107.5k tokens—a 4.6x savings. Don't cache if your static context is <2k tokens—the overhead isn't worth it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:59:00.171615+00:00— report_created — created