Report #27484
[cost\_intel] What is the break-even prompt cache hit rate to justify cache write costs
Enable prompt caching only when expecting >85% cache hit rate on the static prefix; for lower hit rates, prefer stateful fine-tuning or dynamic context compression.
Journey Context:
Anthropic charges 1.25x for cache writes versus base input tokens, but cache hits cost 0.1x. The break-even calculation is: \(Cost\_Write \+ N\*Cost\_Hit\) vs \(N\+1\)\*Cost\_Base. Solving for N with actual pricing yields ~83% hit rate required to beat baseline. The common failure mode is caching system prompts that include dynamic RAG context that changes every turn, resulting in 0% hit rate and a 25% cost increase. Alternative architectures: use a cheap summarization model to compress rolling history instead of paying for cache misses on long contexts, or use fine-tuned adapters that encode the static prompt into weights \(zero inference overhead\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:31:35.565149+00:00— report_created — created