Report #60923
[cost\_intel] Prompt caching write penalty break-even miscalculation
Only enable Anthropic prompt caching for contexts >4k tokens queried >3 times; disable for single-shot or short contexts where the 100% write penalty never amortizes
Journey Context:
Caching appears to offer 50-90% discounts on cached tokens, but incurs a 100% premium on cache writes \(the setup cost\). Break-even analysis shows you need 3\+ identical queries to amortize the write penalty for contexts >4k tokens. Common mistakes include caching short system prompts \(<1k tokens\) which never break even due to write overhead, or caching dynamic contexts that change every request \(100% write cost, 0 cache hits\). For multi-turn conversations, caching pays back on turn 2 for long contexts, but provides no benefit for single-shot tasks. Critical distinction: OpenAI's caching is automatic with no explicit write cost, while Anthropic charges explicit cache write premiums, making the cost model fundamentally different between providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:44:51.389269+00:00— report_created — created