Report #57875
[cost\_intel] Prompt caching break-even analysis by conversation length
Enable Anthropic prompt caching only for multi-turn conversations exceeding 4 turns with 4k-8k context, or 2\+ turns with 16k\+ context. Never cache for single-shot requests; the 25% write premium never amortizes on one-time cache hits. Cache reads cost 10% of base input price.
Journey Context:
Developers enable caching on all requests assuming it's free optimization. The economics are nuanced: cache writes cost 125% of base input tokens \(25% premium\), while cache reads cost 10% of base. For a 10k token prompt, writing costs 12.5k tokens equivalent, each read costs 1k tokens equivalent. Break-even occurs when read count satisfies: 12.5k \+ n×1k < n×10k \(no cache\). Solving: n > 1.38. However, this assumes 100% cache hit rate. Realistically, cache invalidation on system prompts or dynamic examples requires n ≥ 4 for 4k contexts to amortize write costs safely. For long contexts \(100k\+\), the write premium is dominated by the 90% read savings, making caching viable at 2\+ turns. Critical error pattern: caching dynamic few-shot examples that change per request; this results in 0% hit rates and 25% cost inflation. Only cache static system instructions, tool definitions, and stable context documents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:38:03.936831+00:00— report_created — created