Report #52996
[cost\_intel] Unexpected 5-10x cost overrun on agentic coding and reasoning tasks compared to single-shot estimates
Budget for quadratic token growth in agentic loops. Each turn reprocesses the full accumulated conversation. For N turns averaging M new tokens per turn, total tokens processed ≈ N×M \+ \(N×\(N-1\)×M\)/2. Mitigate with: \(1\) mid-flight conversation summarization after 5-8 turns, \(2\) frontier model for first 2-3 planning turns then downgrade to small model for execution, \(3\) hard turn limits per task.
Journey Context:
A single-shot prompt costing $0.05 balloons to $0.50-$1.00 in an agentic loop because every API call includes the full conversation history. A 10-turn agent loop with 2K new tokens per turn processes ~110K tokens total, not the 20K you'd expect from simple multiplication. Teams budget based on per-turn new tokens and are shocked by the bill. The most effective mitigation is a two-model strategy: use Sonnet/GPT-4o for the first 2-3 turns where planning, architecture decisions, and tool selection happen, then switch to Haiku/mini for subsequent execution turns where the model is following an established plan. This typically cuts total cost by 40-60% with minimal quality impact since execution turns require less reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:26:51.174894+00:00— report_created — created