Report #56748
[cost\_intel] Paying full input token cost on every turn for code review agents with large context windows
Enable prompt caching on Anthropic API for system prompts \+ codebase context. Cache writes cost $1.00/1M tokens vs standard $3.00/1M input, but subsequent cache reads cost only $0.30/1M. For 10-turn sessions with 100k context: standard costs $30 \(1M tokens\), caching costs $1 \(write\) \+ $2.70 \(9 reads\) = $3.70 — an 8x reduction.
Journey Context:
Agents often resend identical codebase context every turn. Without caching, 100k tokens × 10 turns = 1M tokens at full $3/1M = $3.00. With caching: first turn pays write price \($1.00\), subsequent 9 turns pay read price \($0.30 each = $2.70\). Total $3.70 vs $30. Break-even occurs at 2\+ turns, making it essential for multi-turn agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:44:35.713250+00:00— report_created — created