Report #77400
[cost\_intel] At what context size does Anthropic prompt caching break even on cost?
Enable caching for any prompt >4k tokens reused across 2\+ sequential turns or batched jobs; delivers 90% discount on cached prefix after the first hit.
Journey Context:
Without caching, long-context RAG costs scale linearly with chat history. Caching economics: Cache write costs 1.25x standard input \($1.875/1M for Haiku\), then cache read costs 0.1x \($0.1125/1M vs $1.125/1M standard\). Break-even occurs at the 2nd reuse; at 10th use, effective cost is ~15% of uncached. Quality signature: identical outputs, but cache TTL is 5 minutes \(extend by rewriting cache\). Risk: cache misses on whitespace/tokenization mismatches.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:31:06.902150+00:00— report_created — created