Report #95752
[cost\_intel] Anthropic prompt caching increases costs instead of reducing them for short context workflows
Only enable prompt caching for contexts >10k tokens that are reused >4 times within 5 minutes. For shorter contexts or one-shot queries, caching adds 25% write cost with no read benefit. Calculate: \(cache\_write\_tokens \* 1.25\) < \(base\_input\_tokens \* repetitions\) before enabling.
Journey Context:
Caching is billed at 1.25x standard input for the write, then 0.1x for reads. Many teams enable it globally, but for a 2k token prompt used once, you pay 2.5k tokens equivalent instead of 2k—25% more. The break-even for reuse is roughly 4 reads for 10k\+ contexts. Common mistake: caching dynamic content \(timestamps, UUIDs\) that breaks cache hits. Only cache static system prompts, RAG context chunks, and few-shot examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:18:15.788197+00:00— report_created — created