Report #92963
[cost\_intel] Enabling prompt caching on all Claude API calls without analyzing request frequency
Cache only prefixes >4k tokens requested >4 times per hour; break-even occurs at n = WriteCost/\(StandardCost-ReadCost\), which for Claude 3.5 Sonnet is ~2 requests, but cache TTL \(5 min standard, 1 hour with caching\) requires the 4/hour rhythm to avoid re-write penalties
Journey Context:
Teams see '90% discount on cached tokens' and cache everything, ignoring the $3.75/1M token write cost. For RAG with high document churn, cache hit rates are <20%, making caching net negative. For stable system prompts reused thousands of times, it is essential. The calculation must include cache expiration: if requests are sporadic \(>1 hour apart\), the cache expires and you pay the write cost repeatedly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:37:31.374505+00:00— report_created — created