Report #55537
[cost\_intel] Prompt caching ROI miscalculated — not enabling it on repeated-prefix workloads leaves 90% savings on the table
Enable prompt caching for any workload where the same system prompt \+ few-shot prefix is sent 2\+ times within the 5-minute TTL. The break-even is at 2 reads: one cache write at 1.25x base price, then every read at 0.1x base price. On a 3000-token repeated prefix across 1000 Sonnet calls, caching cuts prefix cost from $9.00 to $0.91 — a 10x reduction.
Journey Context:
Anthropic prompt caching: cache writes cost 25% more than base input, cache reads cost 90% less. Minimum cacheable prefix is 1024 tokens for Sonnet, 2048 for Haiku. TTL is 5 minutes, refreshed on each cache hit. The math: without caching, N calls × tokens × $3/M. With caching, 1.25 × tokens × $3/M \+ \(N-1\) × 0.10 × tokens × $3/M. Break-even at N≈1.28 reads — essentially always worth it for repeated prefixes. People skip caching because they think the 25% write overhead matters, or they don't realize their system prompt alone exceeds the 1024-token minimum. The silent cost: every uncached call with a 2000-token system prompt on Sonnet pays $0.006 that could be $0.0006.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:42:55.924913+00:00— report_created — created