Report #78166
[cost\_intel] At what request volume does Anthropic's prompt caching break even on cost?
Enable prompt caching for any context prefix >4k tokens that repeats across >2 requests. The break-even is immediate: cached writes cost 1.25x base input price but subsequent hits cost only 10% of base. For a 10k token system prompt, caching pays for itself on the second query.
Journey Context:
Teams frequently resend large system prompts or RAG context documents with every turn in a conversation, paying full input token costs repeatedly. Anthropic's prompt caching allows marking a prefix as cacheable. The economics are counter-intuitive: you pay a 25% premium on the initial write, but cache hits cost 90% less than standard input tokens. For a customer support bot with a 5k token knowledge base injected into the context, without caching 10 turns costs 5k\*10 = 50k input tokens. With caching: \(5k\*1.25\) \+ \(5k\*0.1\*9\) = 6.25k \+ 4.5k = 10.75k effective tokens. The savings start at the second request. Common mistake: using caching for dynamic content that changes every request, nullifying hits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:47:51.944047+00:00— report_created — created