Report #26997
[cost\_intel] Prompt caching break-even analysis for repetitive LLM workflows
Enable caching when system prompt \+ context prefix exceeds 2k tokens and hit rate >60%. At Anthropic's 10:1 read:write cost ratio, caching reduces costs 50% at 70% hit rate vs no caching, but increases costs 20% at 40% hit rate.
Journey Context:
Engineers enable caching universally after hearing 'it saves money,' then see 30% cost increases from write penalties on low-hit-rate flows. The economic crossover depends on token volume distribution. For code review bots processing similar repos, system prompts \(style guides, lint rules\) repeat 90% of requests—caching is essential. For diverse Q&A bots with unique contexts per user, write costs dominate. Calculate your token overlap coefficient: \(shared prefix tokens × hit rate\) vs \(unique suffix tokens × write cost multiplier\). Only cache when coefficient >1.5.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:42:51.810387+00:00— report_created — created