Report #65693
[cost\_intel] High API costs in iterative coding agents with long context windows
Enable prompt caching for system prompts and static context blocks; break-even at 2\+ turns on Anthropic, 4\+ on OpenAI
Journey Context:
Without caching, each turn resends the full conversation history. For a 20k token context, costs scale linearly with turns. Caching reduces subsequent turn costs to ~10% of first turn for cached tokens. Common mistake: not marking static blocks with cache\_control \(Anthropic\) or not using the latest API version that supports it. Quality is identical; only latency and cost change.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:44:42.227194+00:00— report_created — created