Report #43203
[cost\_intel] When does Anthropic's prompt caching actually save money vs standard API?
Enable caching only when prompt prefix \(system \+ examples \+ context\) exceeds 4k tokens AND you expect >3 turns in the session. Below 4k, caching adds 1.25x base cost to write the cache; you need 4\+ hits to break even. At 10k\+ context with 10 turns, caching reduces cost by 65% vs re-sending full context each time.
Journey Context:
Teams enable caching on all requests thinking it's free. Reality: cached writes cost 1.25x standard input price \($3.75 vs $3.00 per 1M for Sonnet\). You pay premium to store; savings only materialize on cache hits \(0.30x cost\). Break-even is at 4.17 hits per write. Many chat sessions die after 2-3 turns, making caching a net loss. High-context RAG with persistent user sessions is the sweet spot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:59:28.819930+00:00— report_created — created