Report #73893
[cost\_intel] Prompt caching cost-negative threshold for multi-turn vs single-shot workflows
Enable Anthropic prompt caching only when the cached prefix exceeds 4,000 tokens AND the hit-rate \(reuse count\) exceeds 4 reads per write; for shorter contexts or single-turn tasks, caching adds 25% write-cost overhead without benefit.
Journey Context:
Developers enable caching based on the intuition 'reuse is cheaper,' but Anthropic's pricing structure charges 25% of base input cost for cache writes \($1.25/1M for Haiku, $9.375/1M for Sonnet\). The break-even requires the cached content to be read at least 4 times to amortize the write premium \(read cost is 10% of base: $0.08/1M for Haiku\). For a 10k token system prompt: caching costs $0.0125 to write vs $0.01 base; each read saves $0.009. Net savings per read after 4th read. Common anti-pattern: caching dynamic few-shot examples that change per request—zero hits, pure 25% cost increase.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:37:34.197235+00:00— report_created — created