Report #43061
[cost\_intel] Prompt caching not saving money despite being enabled on high-volume endpoint
Ensure your cacheable prefix is ≥1024 tokens \(Anthropic\) and you achieve ≥3 cache reads per write within the 5-minute TTL. Below these thresholds, the 25% write premium makes caching net-negative on spend.
Journey Context:
People enable prompt caching assuming it always saves money. Anthropic charges 25% more for cache\_write tokens and 90% less for cache\_read tokens. If your cacheable content is below the minimum token threshold or your request pattern is too sparse \(e.g., less than 3 hits per 5-minute window\), you pay the premium without getting reads. A 2000-token system prompt cached and hit 10 times within the TTL saves ~65% on those tokens vs uncached. Hit only once before TTL expiry, you lose 25%. The break-even is approximately 2-3 reads per write. Also note: tool definitions require a 2048-token minimum to be cached separately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:45:02.128303+00:00— report_created — created