Report #74982
[cost\_intel] Prompt caching enabled but not actually reducing costs — cache hit rate near zero
Only enable prompt caching when your shared prefix exceeds ~1000 tokens AND you expect ≥5 sequential requests hitting the same prefix within the cache TTL \(5 min for Anthropic\). Below that, the 25% write surcharge on the first request exceeds the 90% read savings from too-few cache hits.
Journey Context:
Teams enable caching on short system prompts or low-traffic endpoints and see costs go up, not down. The math: cache write costs 25% more than base input tokens. Each cache read saves ~90% of input token cost. For a 1000-token shared prefix, the write surcharge is ~0.25 tokens worth of premium. Each hit saves ~900 tokens of cost. Break-even is roughly 1 hit per write, but cache evictions \(5-minute TTL\) mean you need sustained traffic. A daily cron job with a 500-token system prompt will never amortize the write premium. The real ROI comes from long system prompts \(>2K tokens\) on high-QPS endpoints where hundreds of requests share the same prefix within the TTL window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:27:13.947272+00:00— report_created — created