Report #64486
[cost\_intel] When does Anthropic prompt caching actually reduce costs versus adding overhead?
Enable caching only for prompts exceeding 4k tokens with >80% cache hit rate; shorter or highly variable contexts cost more due to the 1.25x write premium.
Journey Context:
Anthropic charges 25% extra for cache writes but 90% less for reads. Break-even occurs at roughly 4 cache hits per write. For a 10k token system prompt, you save money after the 4th query. However, if your context changes every turn \(low hit rate\), you pay 25% extra permanently. Common mistake: caching 1k token system prompts—the absolute savings are pennies versus the implementation complexity. The 4k threshold ensures the 25% premium is amortized across enough hits to matter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:43:42.343923+00:00— report_created — created