Report #75487
[cost\_intel] When does Claude's prompt caching actually save money vs standard API calls?
Caching only reduces costs when the same context prefix exceeds ~4,000 tokens AND you have at least 2 queries per cached prefix. Below 4k tokens, standard input pricing is cheaper than the 25% premium on write \+ read charges.
Journey Context:
Anthropic charges 1.25x for writing to cache, then 0.1x for cache hits. For a 10k token system prompt: Standard = $0.30/query. Cached write = $0.375 \(first time\), then $0.03 per hit. Break-even at 2nd query. But for 2k tokens: Standard = $0.06. Cached = $0.075 write \+ $0.006 read. Never breaks even. Many implementers cache everything and accidentally 2x their bill on short-context tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:18:30.483352+00:00— report_created — created