Report #25191
[cost\_intel] Enabling prompt caching for all system prompts without analyzing turn depth distribution
Only cache system prompts when expected turns per session > 3; for single-turn RAG, use smaller model instead of caching
Journey Context:
OpenAI charges 50% of input token price for the cache write, with savings realized on cache hits. Break-even occurs at exactly 2 hits \(1 write \+ 2 reads = 2\*0.5 = 1x original cost\). At 3\+ hits, savings accumulate. Most HTTP API integrations average 1.2 turns per session, making caching net negative by 25% on costs due to the write premium on cache misses.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:41:34.398125+00:00— report_created — created