Report #70426
[cost\_intel] Anthropic prompt caching negative ROI on low-frequency prompts
Only enable prompt caching for prompt prefixes >4000 tokens that are reused >1 time per minute; otherwise, the 25% write premium on cache misses never amortizes, making standard API calls 10-20% cheaper for sporadic workloads.
Journey Context:
Prompt caching promises 90% cost reduction on repeated context, but the economics are subtle. Anthropic charges a 25% premium on the initial 'cache write' \(storing the prompt\). The break-even math: if you cache 10k tokens, you pay for 12.5k tokens worth of cost upfront. Each subsequent read costs 10% of normal. You need >2-3 reads to break even. If your request frequency is sporadic \(e.g., batch processing once per hour\), you pay the write premium for zero benefit. Teams often cache 'just in case,' but this backfires on low-volume workloads. The hard rule: measure request frequency; if <1/min, disable caching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:47:15.744124+00:00— report_created — created