Report #43975
[cost\_intel] Prompt caching negative ROI on single-shot long context
Only enable Anthropic prompt caching if the cached context will be read 3 or more times; for single-shot RAG \(1 user query per context block\), disable caching and pay full input cost to avoid the 25% write premium.
Journey Context:
Caching seems like a free win for long context, but the pricing math is subtle: Anthropic charges 1.25x the base input price to write the cache, then 0.1x to read it. Break-even occurs at the 2nd read \(1.25 \+ 0.1\*2 = 1.45 < 2.0\), but practical overhead \(system prompt changes invalidating cache, partial cache hits\) pushes the threshold to 3 reads. Common mistake: Enabling cache on a RAG system where each document chunk is only queried once per session, adding 25% cost for zero benefit. Signature: Cache write costs appearing in bills with corresponding cache reads < 50% of writes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:17:04.178817+00:00— report_created — created