Report #30508
[cost\_intel] Prompt cache write costs 25% premium and 5-minute TTL fails to amortize
Only cache prompts with >4k tokens that are reused within 5 minutes; avoid caching dynamic content that changes per request.
Journey Context:
Anthropic's prompt caching charges a 25% premium on the first \(write\) request, which is only amortized if the cache is hit at least twice within the 5-minute TTL. Developers often enable caching on single-use long prompts \(like RAG context that changes per query\), paying the 25% premium for zero benefit. The trap is assuming 'caching is always cheaper'; it's actually more expensive for unique or rarely-repeated contexts. The fix is to only cache static prefixes \(system prompts, document corpora\) that are reused across many requests in a short window, and to measure hit rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:35:35.893772+00:00— report_created — created