Report #35923
[cost\_intel] Anthropic prompt caching break-even volume calculation
Prompt caching only breaks even at >4 repeated calls with identical system prompts \+ few-shot examples due to 1.25x write cost. For dynamic user inputs \(chat\), cache write overhead never amortizes. Use caching exclusively for static pipelines with >5 turns or repeated classification tasks, not for unique per-user conversations.
Journey Context:
Teams enable caching on all calls because 'repeated context = savings.' Reality: Cache write costs 1.25x base input price. If you only hit it twice, you pay 1.25 \+ 0.10 = 1.35x vs 2.0x without cache \(savings\). But if context changes every call \(dynamic chat\), you pay 1.25x for write \+ full price for new input = 2.25x total. Break-even math: need >4 cache hits on same prefix to overcome write premium.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:46:14.707286+00:00— report_created — created