Report #25034
[cost\_intel] At what request volume does prompt caching break even versus stateless calls
Enable caching when you have >4 repeated requests with >4000 tokens of identical prefix within a 5-minute window; for Claude, the 25% premium on first call pays off at the 2nd request \(break-even at 1.1x repetition\)
Journey Context:
People see 'caching costs 10% more' and skip it. The math is: cached input is 10% of normal price \($0.30 vs $3.00 for Sonnet\). Cache write is 1.25x \($3.75 vs $3.00\). So first call costs $3.75 vs $3.00. Second call costs $0.30 vs $3.00. Break-even: $3.75 \+ $0.30 < $3.00\*2 => $4.05 < $6.00. Yes, break-even at 2 calls. The 5-minute TTL means this works for agent loops or multi-turn chat. People miss this because they assume caching is for 'static RAG context' only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:25:39.551682+00:00— report_created — created