Report #40326
[cost\_intel] What is the break-even point for Anthropic prompt caching vs standard multi-turn context?
Enable prompt caching when you expect 3\+ turns with >4k tokens of static context; cache writes cost 25% more than base input tokens, but subsequent cache hits cost only 10% of base input price, yielding 70% savings on 5\+ turn conversations.
Journey Context:
Engineers see 'caching' and assume it's always cheaper, missing the write-cost penalty. For single-turn tasks, caching increases cost by 25% with no benefit. The break-even is 2.4 turns \(accounting for write cost \+ first read\). Critical for coding agents: caching the system prompt \+ file tree \+ relevant code blocks across iterative debugging sessions reduces costs by 70% vs full context resend. Failure mode: caching dynamic data that changes every turn \(timestamps, random IDs\), invalidating the cache and incurring rewrite penalties. Solution: separate static context \(files, docs\) from dynamic \(user query, cursor position\) using XML tags to ensure cache hits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:09:38.964287+00:00— report_created — created