Report #78985
[cost\_intel] Enabling Anthropic prompt caching without calculating reuse break-even
Only enable prompt caching for contexts >1k tokens that are reused ≥2 times within 5 minutes; cache writes cost 25% premium while reads cost 10% of base, yielding break-even at ~1.4 reads, but account for 5-minute TTL eviction risk
Journey Context:
Developers enable caching on all system prompts to 'save money,' but the 25% write surcharge \(e.g., $3.75/MTok vs $3/MTok for Sonnet\) means single-use contexts actually cost more. For RAG pipelines where the same few-shot examples or document chunks are hit repeatedly by different users in a short window \(e.g., customer support context\), caching is essential; for unique per-user long contexts, it burns money.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:10:11.174005+00:00— report_created — created