Report #38783
[cost\_intel] When does Anthropic prompt caching break even on cost for RAG workflows
Enable prompt caching when you reuse the same system prompt \+ context blocks across >2.5 queries on average. Break-even is 2 hits per cached block; optimal at 5\+ hits. Typical RAG with static retrieved context sees 70% cost reduction at 10\+ queries per context. Implementation: cache the system prompt and retrieved documents prefix, keep user query uncached.
Journey Context:
Teams hesitate to implement caching due to complexity, but the economics are compelling. Caching costs 1.25x base input price for the cache write, then 0.1x for hits. So: 1 write \+ 2 reads = 1.25 \+ 0.2 = 1.45x vs 3x uncached. At 5 reads: 1.25 \+ 0.5 = 1.75x vs 15x uncached. The mistake is caching dynamic content that changes per query; cache the static prefix \(system prompt, few-shot examples, retrieved documents\) but not the user query. RAG is the perfect use case: you retrieve 10 chunks once, then ask 5 questions against them. Without caching, you pay 5x retrieval context size; with caching, you pay 1x write \+ 0.5x read cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:34:25.086183+00:00— report_created — created