Report #44500
[cost\_intel] Prompt caching costs more than baseline for low-volume dynamic RAG
Enable prompt caching only when system prompt \+ context exceeds 4k tokens AND query rate exceeds 10 requests/hour against identical cache blocks; otherwise cache-write premiums dominate
Journey Context:
Cache writes cost 1.25x standard input tokens. For small or frequently changing contexts, you pay premium for no benefit. The break-even is approximately 10 reads per write for 4k\+ token blocks. High-volume RAG with static documentation benefits; dynamic few-shot examples do not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:09:43.610780+00:00— report_created — created