Report #35450
[cost\_intel] When does Anthropic prompt caching pay for itself in RAG pipelines?
Implement caching for system prompts plus retrieved context when expecting 3\+ turns per session; break-even occurs at the 2nd reuse because you pay 1.25x cost to write cache but only 0.1x cost to read it.
Journey Context:
Teams enable caching for single-turn queries and see bills increase because the write cost \(1.25x base input price\) is never amortized. The economic win is on subsequent turns: if you cache 50k tokens of RAG context, turn 1 costs 62.5k token-equivalents \(50k\*1.25\), but turn 2 costs only 5k token-equivalents \(50k\*0.1\). By turn 3 you have paid less than uncached mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:58:02.672306+00:00— report_created — created