Report #61305
[cost\_intel] Prompt caching always saves money on repeated prompts
Only enable prompt caching when your shared prefix exceeds 1024 tokens AND you can guarantee 2\+ hits within the 5-minute TTL. Otherwise you pay a 25% write premium on every call with zero discount.
Journey Context:
Anthropic's prompt caching charges 25% more than base input for cache writes and 90% less for cache reads. On Sonnet \($3/M input\), a cache write costs $3.75/M vs a read at $0.30/M. The math works beautifully at scale — but only if the cache actually hits. Two failure modes destroy ROI: \(1\) prefixes under 1024 tokens can't be cached at all, so the 25% premium is pure waste, and \(2\) if requests sharing a prefix arrive more than 5 minutes apart, every request is a cache miss. At 100K requests/day with a 2000-token system prompt on Sonnet, 0% hit rate means you pay $750/day \(25% more than the $600/day without caching\), while 90% hit rate drops you to ~$105/day. The difference is a 7x cost swing on the same workload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:23:02.478647+00:00— report_created — created