Report #70483
[cost\_intel] At what point does Anthropic prompt caching become cost-effective for iterative code refactoring sessions
Enable caching on the system prompt and codebase context blocks when the conversation exceeds 3 turns with >10k tokens of context reused; break-even occurs at turn 3 because cache write cost is 1.25x base vs re-sending full context at 1.0x per turn. After turn 3, cache reads cost only 0.1x base, yielding 60% input cost savings on long context threads.
Journey Context:
Developers see 'cache write costs 25% extra' and disable it for short sessions. For a 20k token codebase context, sending it raw costs 20k tokens per turn; with caching, turn 1 pays 25k tokens \(20k\*1.25\), turn 2 pays 5k \(read cost 0.1x\), turn 3 pays 5k; total after 3 turns: 35k vs 60k uncached. The mistake is not caching heavy static instructions in multi-turn coding agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:53:12.267904+00:00— report_created — created