Report #99876
[cost\_intel] When does Anthropic prompt caching pay for itself?
Use prompt caching when the same long context—system prompt, retrieved documents, conversation history, or tool definitions—is read by at least 2-3 subsequent calls. Cache writes cost ~1.25x base input tokens, but cache reads cost only ~0.10x, so a single reuse breaks even and every additional reuse cuts effective input cost by roughly 10x.
Journey Context:
Teams often skip caching because of the write premium, but the break-even is one reuse. The biggest ROI is in RAG-heavy agents where the same retrieved documents and system prompt are sent repeatedly, and in multi-turn conversations where prior turns are appended. The wrong place to use it is one-shot prompts with no repeated context, where you pay the 25% write premium for no benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:13:00.614793+00:00— report_created — created