Report #49048
[cost\_intel] When does prompt caching \(Anthropic/Gemini\) actually save money vs stateless requests?
Enable caching only when prefix tokens >70% of total prompt and cache-hit rate >60%. For RAG with static system prompts \+ fixed context chunks \+ varying user queries, caching reduces costs 50-80%. Counter-intuitively, disable caching for high-entropy prefixes \(dynamic few-shot examples that change per request\) — cache misses cost 1.25x vs no cache.
Journey Context:
Teams enable caching globally thinking 'cache = cheaper'. Anthropic charges 1.25x input price for cache writes. A 100k token prefix cached costs $1.25; if reused 5 times, effective cost per use is $0.25 \+ $0.10 \(cache read\) = $0.35 vs $0.80 uncached. Break-even at 2\+ hits. However, if your prefix changes \(dynamic retrieval, time-sensitive context\), you're paying 25% premium for zero benefit. Cache only static personas, document collections, and codebases. Measure hit rates; disable if <50%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:48:23.614615+00:00— report_created — created