Report #91681
[cost\_intel] Anthropic prompt caching increasing costs instead of reducing them
Only cache prefix blocks >1024 tokens reused across 3\+ turns; disable caching for single-turn or short contexts to avoid 1.25x write cost penalty.
Journey Context:
Anthropic charges 25% premium to write cache \(1.25x standard input price\) but 90% discount on reads \(0.1x\). Break-even is 2-3 reads. Common error is caching system prompts <1k tokens \(wasting 25% write cost\) or caching one-shot contexts \(never read\). The 1024 token minimum is enforced by the API; attempts to cache smaller blocks silently fall back to standard pricing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:28:39.036662+00:00— report_created — created