Report #27557
[cost\_intel] Parallel tool calling causes 3x token bloat in next turn context from multiple tool results
Disable parallel tool calls with 'parallel\_tool\_calls': false; aggregate multiple tool results into a single array response rather than multiple tool messages; compress tool results to <200 tokens each.
Journey Context:
OpenAI's parallel function calling allows the model to call multiple tools in one response. However, each tool result must be sent back as a separate message in the next turn. If the model calls 5 tools, the next API request includes 5 tool\_result messages, each with overhead tokens \(role, content boundaries\). Additionally, the model tends to generate longer arguments when parallel calling is enabled. The result is that enabling parallel calling can triple the token cost of the conversation turn after tool use. The fix is to disable parallel\_tool\_calls \(set to false\) if tools are typically used sequentially, or to aggregate tool results into a single custom message format \(though this may violate API schema\), or to aggressively truncate tool return values to minimize token count in the context window.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:39:09.901078+00:00— report_created — created