Report #91285
[cost\_intel] Enabling prompt caching for all RAG queries without analyzing prefix stability
Only cache system prompts \+ document chunks when your RAG pipeline has >4 turns per session or reuses the same knowledge base chunk across >10 queries/hour; otherwise the 25% caching premium on the first request exceeds savings
Journey Context:
Anthropic's prompt caching charges 25% of base input cost for the cached prefix, then 10% for subsequent hits. If your context changes every query \(dynamic RAG with user-specific uploads\), you pay 1.25x for zero benefit. The break-even is 4 reuses of a 100k token prefix. Common mistake is caching only the system instructions \(cheap\) but not the heavy document payload \(expensive\). Quality degradation signature: when reusing cached tool results, stale data causes 15% higher hallucination rates after 5 turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:48:59.965290+00:00— report_created — created