Report #70918
[cost\_intel] Assuming prompt caching benefits all long-context tasks equally, or not using it due to 'warmup' cost misconceptions
Enable prompt caching only when the same system prompt plus file context repeats across more than 2 calls within a 1-hour window; for code review, cache the entire repository tree but explicitly exclude the diff from the cache. Break-even occurs at the 2nd query \(after initial write\), yielding 90% cost reduction by the 10th query.
Journey Context:
Anthropic's prompt caching charges a 25% premium on the initial cache write but provides a 90% discount on cached read operations. Mathematical break-even: 1.25x base cost = 0.1x base cost × \(n-1\) queries, solving to n ≈ 1.3. Many teams enable caching for everything, paying 25% extra on single-use contexts, or disable it fearing the write cost. The signature pattern for high ROI: static context exceeding 10,000 tokens that repeats with dynamic queries under 1,000 tokens, such as repository context in IDEs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:37:09.600418+00:00— report_created — created