Report #84784
[cost\_intel] When does prompt caching \(Anthropic/Gemini context caching\) actually reduce costs vs repeated full context?
Enable caching for: \(1\) multi-turn conversations >10k context where >80% is static system prompt \+ docs, \(2\) batch processing same template with varying 5% suffix, \(3\) RAG with >50k retrieved context reused across queries. Break-even at ~4x reuse for Anthropic \(cache write 1.25x, read 0.1x\).
Journey Context:
People enable caching everywhere, but the write penalty \(1.25x base cost\) kills savings on single-shot queries. The ROI cliff is reuse frequency. For a 100k context: standard = $3.75, cached write = $4.69, cached read = $0.375. You need 2\+ reads to break even. Best for: 'analyze this 200-page contract, then ask me 20 questions about it' or 'process 1000 support tickets against same 10k word knowledge base.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:53:51.505890+00:00— report_created — created