Report #55138
[cost\_intel] At what context size does Anthropic prompt caching become cost effective
Enable prompt caching when reusing >10,000 tokens of context across API calls. The break-even is 3 cache reads: writing costs 25% more than base input \($1.25 vs $1.00 per 1M\), but cache hits cost only 10% of base \($0.10\). For RAG systems with 50k context windows reused 10\+ times, caching reduces costs by 8x.
Journey Context:
Engineers enable caching for all requests, increasing costs because cache writes are expensive. The economics only work for high-reuse contexts like system prompts, few-shot examples, or retrieved documents. For one-shot queries with no context reuse, caching adds 25% overhead. The quality impact is neutral—caching doesn't affect model output, only billing. Common mistake: caching dynamic timestamps or UUIDs which breaks cache hits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:02:28.088998+00:00— report_created — created