Report #84562
[cost\_intel] Prompt caching always reduces costs for repeated prompts
Enable Anthropic prompt caching only for contexts exceeding 4k tokens that are reused more than 3 times within a 5-minute window; otherwise the 25% cache-write penalty exceeds savings from cache hits
Journey Context:
Caching appears to be pure savings but incurs a 25% premium on the initial write. For single-shot interactions or contexts that vary per user \(dynamic system prompts\), this is pure overhead. The break-even is typically 3-4 cache reads. Common mistake: caching the entire conversation history in multi-turn chat, which causes cache invalidation on every new assistant message \(since the cached block changes\). Correct pattern: cache only the static prefix \(system prompt \+ RAG documents\), never the conversation history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:31:44.451368+00:00— report_created — created