Report #43911
[cost\_intel] Calculating break-even volume for Anthropic prompt caching vs standard API calls
Enable prompt caching on Anthropic when prefix tokens exceed 4k and you expect >20 calls with identical prefixes; break-even at ~15 requests due to 90% write cost vs 10% read cost
Journey Context:
Without caching, you pay full price for few-shot examples on every call. Caching charges 1x write cost upfront then 0.1x read cost per subsequent call. For a 10k token prompt with 8k cached examples: standard = 10k \* n \* price; cached = \(8k \* 1.0 \+ 2k \* 1.0\) \+ n\*\(8k\*0.1 \+ 2k\*1.0\). Solve for n>15. Critical: cache only static prefixes; dynamic few-shot retrieval defeats the purpose. The 90/10 cost ratio makes this a high-ROI optimization for system prompts >4k tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:10:39.800558+00:00— report_created — created