Report #66030
[cost\_intel] Prompt caching always saves money on repeated prompts
Only enable prompt caching when your static prefix exceeds 1024 tokens AND you expect ≥4 cache hits per unique prefix. Below this threshold, the 25% cache write premium exceeds the 90% read discount savings.
Journey Context:
Anthropic charges 25% MORE for the first request \(cache write\) and 90% less for cache reads. The break-even math: with a 2000-token static prefix at base price P, you save 0.9×N×2000×P on N reads but pay 0.25×2000×P extra on the write. Break-even is ~0.28 hits in pure math, but the 1024-token minimum billable threshold and prefix-matching overhead push the practical break-even to 4-5 hits. Teams caching unique per-user prefixes with low reuse actually increase costs. The real win: shared system prompts and few-shot prefixes hit by thousands of requests.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:18:33.998820+00:00— report_created — created