Report #87869
[cost\_intel] Enabling prompt caching for single-turn RAG queries without calculating break-even
Only cache static system instructions \+ tool definitions in RAG if your session averages >5 turns or you reuse the same context prefix across >10 queries; otherwise, the cache write cost \(1.25x standard input\) eliminates savings versus the 0.1x hit discount.
Journey Context:
Anthropic and OpenAI charge 1.25x for cache writes but 0.1x for cache hits. Developers see '90% discount' and cache everything. For a RAG query with 2000 tokens of instructions and 1000 tokens of retrieved chunks, caching the instructions only saves money if you query the same cache key multiple times. For single-shot RAG \(most common\), you pay 1.25x for the write vs 1.0x standard, so you lose money. The break-even is roughly 5-10 hits on the same prefix to amortize the 25% write premium.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:04:27.465894+00:00— report_created — created