Report #58624
[cost\_intel] What is the break-even point for prompt caching in RAG systems?
Enable prompt caching when context window reuse exceeds fifty percent across calls; expect fifty to ninety percent cost reduction on long system prompts and static document chunks.
Journey Context:
Without caching, every RAG call resends the entire system prompt plus retrieved documents. At eight thousand tokens context with four thousand static tokens, four thousand tokens are wasted per call. Caching provides maximum value in high-volume scenarios exceeding one thousand calls per day with stable prefixes. Anthropic's caching implementation uses a five-minute time-to-live window; OpenAI offers variable pricing tiers. The break-even typically occurs at three to five calls per cached prefix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:53:17.928609+00:00— report_created — created