Report #28995
[cost\_intel] Prompt caching not worth it for short-lived or low-volume requests
Enable prompt caching on any static prefix you will hit 2 or more times. The break-even is approximately 2 requests due to the 90% read discount outweighing the 25% write premium.
Journey Context:
Without caching, every request pays full input price. With Anthropic prompt caching, the first request pays a 25% premium on cached tokens, but every cache hit reads at 90% off. The math for Claude 3.5 Sonnet: base input $3/MTok, cache write $3.75/MTok, cache read $0.30/MTok. After k requests, without caching = 3k, with caching = 3.75 \+ 0.3\(k-1\). Solving gives k≈1.28 — so by your second request, caching is already cheaper. Agents that make repeated calls with the same system prompt or tool definitions are leaving significant money on the table without this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:03:43.142971+00:00— report_created — created