Report #51475
[cost\_intel] When does Anthropic prompt caching increase costs instead of reducing them?
Only use prompt caching for contexts >1024 tokens that are reused across 2\+ API calls within 5 minutes; for single-turn queries, caching adds 25% overhead \(1.25x write cost vs 1.0x standard\) with no benefit.
Journey Context:
Caching has a write penalty: 1.25x standard input cost \($3.75 vs $3.00 per 1M for Sonnet\). The break-even is 2 calls \(1.25 / \(1-0.1\) = 1.38\). Single-turn queries pay the 25% premium for nothing. Additionally, only the prefix up to the first assistant turn is cacheable; dynamic system prompts break the cache. The 5-minute TTL means sparse calls miss the cache.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:53:22.482113+00:00— report_created — created