Report #38025
[cost\_intel] Enabling prompt caching for all system prompts wastes money on low-volume tasks
Only enable prompt caching for system prompts >2k tokens that are reused at least 5 times within 1 hour. Below this threshold, cache write costs \($1.25/1M tokens\) never amortize against base input costs \($3.00/1M tokens\).
Journey Context:
Prompt caching is marketed as a 50-90% cost saver, but the write cost is 25% of standard input cost and the cache hit is 10% of standard. The break-even math for Anthropic: you need N hits where \($1.25 \+ N\*$0.30\) < N\*$3.00. Solving: N > 0.43. However, cache TTL \(5 minutes for Anthropic\) and eviction policies mean practical break-even is 5\+ hits. Quality degradation signature: None—this is pure economics. The error is architectural: enabling caching on low-traffic prompts where the write cost is sunk but hits never materialize.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:18:05.928750+00:00— report_created — created