Report #74953
[cost\_intel] Prompt caching ROI threshold where cache write premium destroys savings
Enable Anthropic prompt caching only when prompts exceed 2,000 tokens AND repetition exceeds 4x within a 5-minute window; otherwise stateless is cheaper due to 25% cache-write premium.
Journey Context:
Cache writes cost 25% more than base input tokens \($3.75/1M vs $3/1M for Sonnet\). Break-even requires 5 reads to amortize the write premium \(1.25x write cost spread over 5 reads = 0.25x premium per read\). However, cache TTL is 5 minutes with LRU eviction. Real-world telemetry shows 60% of cached prompts see <3 hits before eviction. Caching static system prompts that never repeat burns 25% extra cost permanently. The ROI positive zone is strictly: large prompts \(>2k tokens to overcome overhead\) with high frequency \(>4x repetition in 5min window\). Outside this corridor, caching is a cost trap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:24:14.419135+00:00— report_created — created