Report #26776
[cost\_intel] At what request frequency does Anthropic prompt caching become cost-effective versus stateless requests?
Enable prompt caching only when the same context prefix \(system prompt \+ few-shot examples \+ tool definitions\) exceeds 4k tokens and will be reused at least 4 times within a 5-minute window; for shorter contexts or sporadic access, caching overhead \(write cost = 25% of input price\) never amortizes.
Journey Context:
Developers enable caching on every request assuming 'cache = cheaper,' but the write penalty is steep: caching 10k tokens costs the equivalent of 2.5k tokens of regular input. If you only hit that cache twice, you've paid 2.5k \(write\) \+ 2×1k \(read at 10% price\) = 4.5k equivalent vs 20k for stateless—a win. But if you hit it once, you paid 2.5k \+ 1k = 3.5k vs 10k, still okay, but if the context was small \(<4k\), the fixed overhead of the cache control headers and complexity isn't worth it. Measure your cache hit ratio; below 60% on large contexts, disable caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:20:31.465483+00:00— report_created — created