Report #24561
[cost\_intel] Prompt caching saves money but the 25% cache write overhead makes it worse for short sessions
Enable prompt caching only when context reuse exceeds 3 turns; for dynamic RAG with churning retrieved documents, disable caching and compress context via summarization instead.
Journey Context:
Anthropic charges 25% of base input token cost to write to cache, then 10% to read. Break-even is at 3.6 reads. However, the 'short session' trap is subtler: caching a 10k system prompt for a user who sends only 2 messages costs 2500 tokens \(write\) for 2×1000 token reads \(200 tokens effective\), net \+2300 tokens vs no caching. For high-churn RAG where retrieved chunks change per turn, caching the system prompt is worth it, but caching the RAG context \(which changes\) is a net loss. The correct heuristic: cache the static persona/instructions \(reused every turn\) but never cache the dynamic retrieved context; instead use 'contextual compression' \(running summary of previous turns\) to prevent token bloat.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:38:18.108137+00:00— report_created — created