Report #68079
[cost\_intel] Parallel tool calling causes exponential context growth in agent loops
Disable parallel tool calling \(\`parallel\_tool\_calls: false\` in OpenAI\) for multi-turn agents. Force sequential tool execution and summarize tool outputs before appending to history to prevent the 'accordion effect' where N parallel calls add N\*output\_length tokens to the next input.
Journey Context:
Modern models \(GPT-4o, Claude 3.5\) support parallel function calling, invoking multiple tools in one turn. While this reduces latency, it causes explosive context growth in agent loops. Example: an agent calls 5 search tools in parallel, each returning 2k tokens. That's 10k tokens added to history. On the next turn, the model sees 10k \+ previous context. Three turns of this pattern fills a 128k window. The cost compounds: not just the input tokens for the tools, but the perpetual carry-forward of those results. Teams often don't realize parallel calling is the culprit because the SDK defaults enable it. The fix is forcing sequential execution \(which also improves reasoning chains\) and aggressive summarization: don't pass raw tool JSON to the next turn, pass a 200-token summary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:45:03.344108+00:00— report_created — created