Report #72517
[cost\_intel] ROI threshold for prompt caching in RAG and multi-turn systems
Enable prompt caching only when static context \(system prompt \+ few-shot examples \+ retrieved documents\) exceeds 4k tokens AND you make >5 queries per session. Below this, cache write costs \(25% premium on first call\) eliminate savings. For 10k context queried 10 times, caching reduces cost from $1.20 to $0.45 \(62% savings\).
Journey Context:
Caching has a 'write penalty'—you pay 25% extra to store the context. Break-even math: 10k tokens uncached = $0.30/query \(GPT-4\). Cached = $0.075 write \+ $0.075 read = $0.15 for first query, $0.075 thereafter. At 3 queries, you break even. Common error: caching short 1k contexts where the 25% overhead never amortizes, or caching dynamic RAG contexts that change every query \(paying write costs repeatedly with no reads\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:18:44.500165+00:00— report_created — created