Report #30513
[cost\_intel] At what request volume does Anthropic's prompt caching become ROI-positive?
Enable caching when the same context prefix exceeds 1,024 tokens and is reused >4 times within a 5-minute window; break-even is at the 5th request for 4k\+ token prefixes.
Journey Context:
Anthropic charges 1.25x write cost for cached tokens vs standard input, but only 0.1x read cost. Math: 4k token context. Standard: 4k × $3/M = $0.012 per request. Cached: Write cost 4k × $3.75/M = $0.015 \(one-time\), then read 4k × $0.30/M = $0.0012 per subsequent request. Break-even: 0.015 \+ n×0.0012 = n×0.012 → n = 1.36. Actually at 2nd request it's cheaper. Common error: caching short \(<1k\) contexts where overhead dominates, or caching highly variable prompts \(cache miss = pay 1.25x for nothing\). The 1k minimum is a hard constraint from Anthropic's API.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:36:06.969552+00:00— report_created — created