Report #96546
[cost\_intel] Multi-turn coding assistants burn 90% of tokens on repeated system prompts and file context across 10\+ turns
Enable Anthropic prompt caching on system prompts and file context blocks; pay 1.25x write cost once, then 0.1x read cost per subsequent turn. Break-even at 3 turns, 5x cheaper at 10 turns.
Journey Context:
Standard chat history resends the entire context window each turn. For a 10k token system prompt \+ file context, that's 100k tokens over 10 turns. Caching writes once \(12.5k tokens equivalent cost\) then reads at 1k tokens equivalent per turn. Total: 22.5k vs 100k cost units. Critical: cache breakpoints must be placed at immutable context \(system prompt, file contents\) not mutable history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:38:10.809909+00:00— report_created — created