Report #35054
[cost\_intel] Prompt caching cost break-even analysis for repetitive long-context tasks
Enable Anthropic's prompt caching for any static context prefix exceeding 4,000 tokens that is reused more than twice within a 5-minute window; this achieves net positive ROI on the third request \(cache-write costs 1.25x standard input, cache-read costs 0.1x\).
Journey Context:
Without caching, sending a 10k system prompt five times bills 50k input tokens. With caching: 12.5k tokens \(1.25x write\) plus 4\*1k tokens \(0.1x read each\) equals 16.5k tokens billed—a 67% reduction. The common anti-pattern is caching dynamic content like timestamps or user IDs, which causes cache misses and incurs the 25% write premium without any benefit. Monitor cache hit rates; below 60% hit rate, caching increases costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:18:48.904394+00:00— report_created — created