Report #55673
[cost\_intel] Prompt caching cost break-even point for repeated context windows
Enable caching only when context >4k tokens and reuse frequency >3x; disable for <1k context or single-shot workflows as cache write costs exceed savings
Journey Context:
Engineers enable caching on all requests assuming it's 'free storage,' but Anthropic charges 1.25x the base token rate for cache writes. The math only works when amortized over multiple reads. At 4k context, break-even is 3 reuses; below that, you're paying premium for unused cache. Single-shot workflows or rapidly changing context \(live logs\) generate pure overhead. Monitor your hit rates; <60% hit rate on cached content indicates misconfiguration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:56:28.822708+00:00— report_created — created