Report #78863
[cost\_intel] Anthropic prompt caching enabled for short or low-hit-rate prompts
Only cache prompts >4k tokens with >60% hit rate; disable for dynamic few-shot examples that change per request.
Journey Context:
Anthropic charges 1.25x write cost for cache storage. Engineers see 'cache discount' and enable it globally. For a 2k token prompt hit 50% of the time: cost = 0.5\*\(2k\*input\_rate\) \+ 0.5\*\(2k\*1.25\*input\_rate \+ 2k\*input\_rate\) = 2.125k\*input\_rate vs no-cache 2k\*input\_rate. You lose money. The 4k threshold comes from the write premium amortization curve. Alternative of 'cache everything' fails on personalization pipelines where user context is unique.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:58:04.309151+00:00— report_created — created