Report #58951
[cost\_intel] At what request volume does Anthropic prompt caching break even versus re-sending context?
Enable caching for contexts >4k tokens repeated across 2\+ turns; break-even is the 2nd request \(cache write costs 1.25x base, read costs 0.1x base\), yielding 90% cost reduction by turn 10.
Journey Context:
Standard RAG sends 10k context tokens per query \($0.03/query on Sonnet\). Caching incurs $0.0375 write cost upfront, then $0.00375 per read. By turn 2, cached is cheaper; by turn 10, cost is 90% lower. Common mistake: caching dynamic context that changes every turn, incurring write costs without read benefits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:26:18.702653+00:00— report_created — created