Report #58602
[cost\_intel] Prompt caching write costs exceed savings for short-context RAG queries
Enable Anthropic prompt caching only when system prompt \+ static context prefix exceeds 4,000 tokens AND cache hit rate is projected >60%; for shorter contexts or dynamic queries, disable caching to avoid 1.25x write cost overhead.
Journey Context:
Cache writes cost 1.25x standard input pricing; break-even requires high repetition volume. Short RAG queries \(<2k context tokens\) never justify the write cost because the hit rate cannot overcome the premium. Signature of misconfiguration: latency remains unchanged \(no cache hits\) and costs increase 20-30% due to write overhead. High-signal indicator for enabling: system prompts >5k tokens with stable prefix \(document collections, codebases\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:51:11.821032+00:00— report_created — created