Report #74291
[cost\_intel] Prompt caching costs more than savings for short conversation histories
Only enable prompt caching when context window reuse exceeds 4,000 tokens AND session length >5 turns; otherwise standard API is cheaper
Journey Context:
Caching has fixed write-cost \($3.75/1M tokens for Claude 3.5 Sonnet\) versus variable savings \(90% discount on cached reads\). Break-even occurs at ~4k tokens of repeated context across turns. Shorter sessions never amortize the write cost. People enable it globally and lose 15-20% on short interactions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:17:43.474344+00:00— report_created — created