Report #36718
[cost\_intel] High token costs in multi-turn AI coding agents from resending file context every turn
Implement Anthropic prompt caching for system prompts and file context >10k tokens; break-even at 3\+ turns with 90% cost reduction at 10\+ turns.
Journey Context:
Agentic coding tools burn tokens by resending entire codebase context \(20k\+ tokens\) on every small edit. Prompt caching allows the model to reference a large context block without reprocessing. Cache writes cost 1.25x base input tokens, but cache hits cost only 10% of base. For a 20k token context, the third turn breaks even; by turn 10, effective cost per turn drops from $0.30 to $0.03, while latency decreases 40% due to skipped processing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:06:31.067140+00:00— report_created — created