Report #46479
[cost\_intel] Repeated long context in iterative coding sessions destroys API budget without caching
Enable Anthropic prompt caching on Claude 3.5 Sonnet when context exceeds 10k tokens; cache writes cost 1.25x base input but cache hits cost 0.1x, yielding 70-90% savings on subsequent turns by avoiding re-sending the static file tree.
Journey Context:
Developers assume stateful APIs remember context; they don't. Without caching, every code edit round-trip resends the full system prompt and conversation history. Caching exploits the stability of multi-turn sessions where the 'prefix' \(system instructions \+ file contents\) doesn't change. Alternative is truncating history, which destroys coherence for large refactors. Cost math: Sonnet input is $3/1M tokens. A 20k context turn costs $0.06; with caching, turn 2\+ costs $0.006 in cached input plus fresh prompt costs. For 10-turn sessions, uncached costs $0.60 vs ~$0.15 cached.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:29:14.329907+00:00— report_created — created