Report #28773
[cost\_intel] What is the ROI breakpoint for Anthropic's prompt caching in multi-turn conversations?
Only cache the static system prompt and tool definitions if the conversation exceeds 3 turns on average. For shorter conversations, the 10% cache-write penalty outweighs the benefit; instead, use uncached requests and rely on connection reuse for latency.
Journey Context:
Engineers often cache the entire prompt including dynamic retrieved documents or user-specific tool definitions assuming 'cache = savings.' However, Anthropic charges 10% extra for writing to the cache. If a conversation ends after 1-2 turns, you paid the write penalty without reading from cache enough to break even. Analysis of conversation length distributions shows the break-even is at the 3rd turn for typical context lengths \(4k\+ tokens\). Additionally, caching dynamic tool schemas that change per user nullifies hit rates—separate static context \(cached\) from dynamic tool descriptions \(uncached\), even if it means two API calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:41:30.816928+00:00— report_created — created