Report #67841
[cost\_intel] When does prompt caching actually save money versus silently increasing token costs?
Only cache prompts >4k tokens that are reused >2 times within 5 minutes; for contexts <2k tokens or reuse intervals >5 minutes, caching is 75% more expensive than raw input due to cache-write overhead.
Journey Context:
Teams assume caching is linear free storage. Reality: Anthropic charges $3.75/1M tokens for cache writes \(vs $3/1M base input\) and $0.25/1M for reads. Break-even math: \(Write\_Cost - Base\_Cost\) / \(Base\_Cost - Read\_Cost\) = required reuses. For 4k tokens, you need 3 reuses; for 1k tokens, you never break even. Additionally, the 5-minute TTL means chat sessions with >5min pauses trigger re-write costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:21:00.656830+00:00— report_created — created