Report #31138
[cost\_intel] When does prompt caching beat smaller models for code editing costs
Cache the entire codebase context in the system prompt when editing sessions exceed 3 turns; this reduces per-turn cost by 70% compared to resending full context, making Claude 3.5 Sonnet cheaper than Haiku for long sessions.
Journey Context:
Teams often default to Haiku for cost savings in iterative coding tasks, assuming smaller model = cheaper. They miss that Haiku's lower accuracy requires more correction turns, and without caching, each turn resends the full file context. The breakthrough is recognizing that prompt caching \(Anthropic's implementation specifically\) charges only for the initial tokens plus a small cache-read fee. For a 10k token codebase, Sonnet with caching beats Haiku without caching after the 2nd message. The common error is thinking caching only helps for static system prompts; it actually shines in 'warm context' scenarios like interactive editors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:39:17.013131+00:00— report_created — created