Report #51497
[cost\_intel] What is the break-even point for Anthropic's prompt caching vs standard API calls?
Enable prompt caching only when you expect >3 cache hits per cached block; at $0.30/M input tokens for the cache write plus $0.03/M for cache hits \(vs $3.00/M standard\), the 90% read discount pays off the 10x write premium after the 3rd read.
Journey Context:
Engineers see '90% cheaper reads' and cache everything, ignoring the 10x expensive write cost. For one-shot RAG queries, caching increases costs by 10x. The math only works for multi-turn conversations or high-reuse context \(system prompts, retrieved documents accessed repeatedly\). We modeled a support bot: caching the 10k token knowledge base costs $0.03 to write; after 4 queries, total cost is $0.042 vs $0.12 without caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:55:49.841837+00:00— report_created — created