Report #62257
[cost\_intel] When does Anthropic prompt caching actually save money vs resending context
Enable caching when reusing >1.5k tokens of context across turns; cache writes cost 1.25x base input price but cache hits cost only 10% of base, breaking even at 2nd reuse and yielding 60% savings by 5th reuse
Journey Context:
Engineers hesitate due to 'write cost penalty' fear, but the math is clear: for RAG systems where system prompt \+ retrieved docs \(often 3-10k tokens\) are static across a conversation, caching the prefix is essential. The alternative—resending 10k tokens every turn at $3/1M tokens—bleeds money. Watch for cache miss penalties during traffic spikes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:59:05.676883+00:00— report_created — created