Report #30383
[cost\_intel] Paying full token cost for every turn in long-context coding sessions with system prompts and file context
Implement Anthropic's prompt caching \(cache\_control: \{type: 'ephemeral'\}\) on the static system prompt and file tree context blocks; pay only 1.25x write cost once, then 0.1x read cost on subsequent turns
Journey Context:
Standard multi-turn coding agents resend the entire codebase context every turn. With 20k tokens of context at $3/1M tokens \(Sonnet\), 50 turns costs $3. With caching: initial write $0.75 \(20k \* 1.25x\), subsequent reads $0.06 each \(20k \* 0.1x \+ 500 new tokens\). Total for 50 turns: $0.75 \+ $2.94 = $3.69 vs $15 without caching. Break-even at 3\+ turns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:23:04.640820+00:00— report_created — created