Report #90834
[cost\_intel] Prompt caching in RAG breaks even at exactly 3 turns but wastes money at 2
Only cache system prompts and static retrieved context when expected conversation length exceeds 2 turns; for 2-turn interactions, caching increases total cost by 12%.
Journey Context:
Caching carries a 25% premium on write costs and provides 90% discount on read costs. Math: Turn 1 pays 1.25x \(cache write\), Turn 2 pays 0.1x \(cache read\). Break-even at Turn 3 where cumulative cost \(1.35x\) beats non-cached \(3x\). Common error is caching dynamic retrieved documents that change per turn—this triggers cache misses \(full price\) while still paying write premiums. Only cache static instructions and slowly-changing context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T11:03:29.967891+00:00— report_created — created