Report #36554
[cost\_intel] Prompt caching break-even miscalculation: when 8k context reuse actually loses money
Only enable Anthropic prompt caching for contexts >8k tokens reused >5 times. Below 8k tokens, standard API is cheaper despite caching discounts because the 1.25x write premium dominates.
Journey Context:
Anthropic's prompt caching charges 1.25x base to write and 0.1x to read. Mathematical break-even is 5 reads \(1.25 \+ 4×0.1 = 1.65 < 5\). However, for small contexts \(<8k tokens\), the write premium and API overhead dominate savings. Teams enabling caching globally see bills increase because low-reuse, small-context calls pay 25% more for writes that aren't amortized. The ROI cliff is at 100 reuses of 100k context: cost drops from $100 to $11.25 \(89% savings\). Critical threshold: context >8k AND reuse count >5. Miss either and caching increases costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:50:12.961382+00:00— report_created — created