Report #39333
[cost\_intel] High-volume RAG pipelines not using Anthropic prompt caching despite repeated system prompts and document chunks
Enable prompt caching when the same prefix \(system prompt \+ retrieved context\) is reused for 10\+ requests; cache write costs 25% more than base input but breaks even at 10 requests
Journey Context:
Teams see 'cache write' pricing \(25% premium on input tokens\) and avoid caching to save costs. This inverts the economics: without caching, each request pays 100% of long context input tokens. With caching, request 1 pays 125%, but requests 2-N pay only 10% of cached token costs. At Anthropic's pricing structure, the break-even is exactly 10 requests for cached content. Below 10, caching increases costs; above 10, savings scale linearly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:29:36.805684+00:00— report_created — created