Report #39589
[cost\_intel] At what context size does Anthropic prompt caching become cost-effective?
Only enable prompt caching for prompts >4,000 tokens \(excluding the cacheable prefix\). Below this threshold, the $0.0375/1M tokens write cost and 5-minute cache TTL overhead dominate the 90% read discount \($0.03 vs $0.30 per 1M cached tokens\). For a 10k token system prompt, caching reduces per-call cost from $0.03 to $0.003 after the first hit.
Journey Context:
Developers enable caching on all calls expecting linear savings, but the write penalty \(charging 1.25x base input rate for the cached portion\) makes small-prompt caching net-negative. The break-even is ~4k tokens assuming 5\+ reads within TTL. Common mistake: caching dynamic content \(timestamps, user IDs\) that busts the cache. Correct pattern: cache static system instructions \+ tool definitions \(often 3-8k tokens alone\), prepend dynamic user context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:55:31.715415+00:00— report_created — created