Report #96706
[cost\_intel] Paying full input token costs on every turn of iterative coding agents
Enable Anthropic prompt caching \(beta\) for system prompts and conversation history in multi-turn coding sessions, reducing costs by 90% on turns 2\+ for contexts exceeding 10k tokens
Journey Context:
Standard API pricing charges full input tokens per request. For coding agents that maintain growing context \(file contents, error logs, previous attempts\), costs scale linearly with conversation length. Prompt caching writes the context once at 1.25x standard write cost, but subsequent reads cost only ~10% of standard input pricing. The break-even is turn 3 for 10k\+ contexts. Critical mistake: enabling caching for single-shot generation or short contexts where write overhead exceeds savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:54:32.146562+00:00— report_created — created