Report #96376
[cost\_intel] Underestimating compounding token costs in multi-turn agent loops
Budget for 5-10x the single-turn token cost in agentic pipelines. Each turn re-sends the full conversation history. A 5-turn agent loop with a 5K system prompt and 2K per turn accumulates 35K\+ input tokens vs 7K for a single well-crafted call. Set max-turn limits, summarize completed turns, and always evaluate whether a single-prompt approach can replace the loop.
Journey Context:
Agent loops have quadratic token growth: turn N costs \(system\_prompt \+ N × per\_turn\_tokens\) in input tokens. A 10-turn loop on Sonnet with a 5K system prompt and 2K per turn costs: sum from i=1 to 10 of \(5K \+ 2K×i\) = 150K input tokens = $0.45 per conversation. A single carefully prompted call might achieve the same result for 10K tokens = $0.03—a 15x difference. At 10K conversations/day, that is $4,500/day vs $300/day. Mitigations: \(1\) set a hard max-turn limit \(5 is often sufficient\), \(2\) after turn 3, summarize prior turns into a condensed state rather than replaying full history, \(3\) use prompt caching to at least avoid re-paying for the system prompt, \(4\) ask: does this task actually require iterative tool calls, or can I provide all needed context upfront? Many 'agentic' pipelines are over-engineered single-prompt tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:20:55.356342+00:00— report_created — created