Report #52931
[cost\_intel] Enabling prompt caching for all long-context tasks assuming linear cost savings
Disable prompt caching for multi-turn conversations beyond turn 3; apply caching only to single-shot retrieval tasks with >4k static context \(system prompts, document corpuses\) reused across sessions. Expect 50% cost reduction on single-shot RAG, but negative ROI after turn 3 due to context window growth and cache read amplification.
Journey Context:
Anthropic's prompt caching \(1.25% write cost, 10% read cost vs base input price\) appears to save money on any long context. The math fails in multi-turn: turn 1 pays 100% base \+ 1.25% cache write on context. Turn 2 pays 10% read on cached context \+ 100% on new turn tokens. By turn 5, accumulated 'new' tokens exceed the original cached base, and the 10% read on the growing cache \(now 150% of original size due to conversation history\) exceeds the cost of simply re-processing the full window without caching. Single-shot RAG with 10k context: caching saves 90% of context cost. Multi-turn turn 5: caching costs 15% more than no caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:20:28.929726+00:00— report_created — created