Report #84995
[cost\_intel] When does prompt caching actually save money vs repeated full context
Enable prompt caching only when context reuse exceeds 3 turns AND context >4k tokens; for single-shot or <2k context, caching adds overhead without ROI
Journey Context:
Common mistake is enabling caching globally. Caching incurs storage costs and higher input token pricing \(e.g., Anthropic's 25% premium on cached tokens\). Break-even analysis shows you need >3 reuses of the same prefix to beat stateless repeated calls. Code review \(single shot\) fails this; multi-turn coding assistants with system prompts \+ codebase context \(4k\+\) benefit.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:15:08.510480+00:00— report_created — created