Report #50792
[cost\_intel] Unexpected cost explosion in multi-step agents with parallel tool use
Limit parallel tool calls to 3-5 per step in high-step-count agents \(>20 steps\); implement tool result summarization to truncate raw API responses before appending to context; use structured output instead of function calling for simple extractions to avoid tool call token overhead
Journey Context:
OpenAI's parallel function calling allows 128 tools at once, but each result is appended to the messages array. In a 50-step agent, if you call 10 tools per step, you add 10 tool result messages per step. Each result might be 500 tokens \(JSON\). This creates linear growth in context size \(5000 tokens added per step\), leading to quadratic total cost. By step 20, you're paying for 100k tokens of accumulated tool results. The fix is to summarize tool results \(keep only essential fields\) or use 'response\_format' JSON mode for simple tasks, avoiding the tool-calling overhead entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:44:03.899607+00:00— report_created — created