Report #76937
[cost\_intel] Prompt caching only saves money on Claude if prefix exceeds 1024 tokens
Only enable prompt caching on Anthropic models when your static prefix \(system prompt \+ tools \+ context\) exceeds 1,024 tokens. Below this threshold, the cache write penalty \(10x standard token cost for the first write\) makes caching more expensive than standard API calls. Above 4k static tokens, caching reduces costs by 60-80% on multi-turn conversations.
Journey Context:
Engineers enable caching for all requests after reading the headline '90% discount on cache hits,' but Anthropic charges a premium for cache writes \(10x the base input price for the cached portion\). On a 500-token prefix, the write cost \($0.25 \* 10 = $2.50 per 1M tokens cached\) outweighs the savings for dozens of hits. The break-even math: with a 4k token prefix \($3.00 per 1M input, $30 write cost\), you need 11\+ hits at 90% discount \($0.30\) to amortize the write. This is only achievable in multi-turn agents or high-volume RAG with repeated context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:44:08.943588+00:00— report_created — created