Report #76235
[cost\_intel] Prompt caching ROI negative for dynamic RAG conversational flows
Disable Anthropic prompt caching for RAG where retrieved chunks vary per turn; the 25% cache-write tax \($0.25/1M tokens written\) requires 4 identical queries to break even, but dynamic context yields <20% hit rates, increasing net costs 15-30% vs no caching.
Journey Context:
Engineers enable prefix caching assuming static system prompts save money, but Anthropic charges 25% of base input price for cache writes \($0.25 vs $3.00 for Sonnet\). In RAG, the 'context' includes fresh retrieved chunks that change every turn, forcing cache misses on the variable portion while still paying the write tax on the static prefix. Break-even analysis: cache write cost = 0.25x, cache read savings = 0.9x \(90% discount\), so you need 4 reads to recover 1 write. Conversational RAG rarely repeats identical full contexts 4 times. Use caching only for few-shot examples with static schema definitions or codebases where files don't change.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:32:56.538699+00:00— report_created — created