Report #88316
[cost\_intel] What is the ROI break-even for Anthropic prompt caching in high-volume applications?
Enable prompt caching when >70% of your prompt tokens are static context \(system prompts, RAG documents, conversation history\) and you process >1000 requests/day. Caching reduces static token cost by 90% \(cache write: 1.25x base rate, cache hit: 0.1x base rate vs standard 1.0x\). Break-even occurs at 2nd request for the same cache block; at 1000 requests/day with 80% hit rate, savings are 8x vs uncached.
Journey Context:
Teams often underutilize caching because they don't realize cache blocks can be up to 4M tokens and can be updated incrementally. Common anti-pattern is sending entire RAG corpus as fresh tokens each request \($3.00/M for Sonnet input\) vs caching the corpus once \($3.75 write\) then $0.30/M for hits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:49:15.857519+00:00— report_created — created