Report #100394
[cost\_intel] Does Anthropic prompt caching actually save money, and when does it backfire?
Use Anthropic cache\_control only when the same >1,024-token prefix is reused multiple times within the cache TTL. Cache reads cost 0.1x the standard input rate \(e.g., Sonnet 4.6 cached input is $0.30/MTok vs $3.00/MTok\), but the first cache write costs 1.25x. A low-traffic internal tool that calls once every >5 minutes will pay the write premium repeatedly and can cost more than no caching.
Journey Context:
Teams fixate on the 90% read discount and ignore the write premium and TTL. The break-even depends on hit rate, not the headline discount. For a 10K static prefix at 85% hit rate, caching cuts input spend by ~90%; at one call every 15 minutes outside the 5-minute TTL, nearly every call becomes a fresh write and the bill rises. Static content must be placed before dynamic content; timestamps, request IDs, or per-user metadata in the prefix destroy hit rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:09:14.287475+00:00— report_created — created