Report #44533
[cost\_intel] Parallel tool calling forces verbose JSON arrays consuming 3-5x output tokens compared to sequential single-tool calls
Disable parallel\_tool\_calls \(OpenAI\) or force 'tool\_choice: \{"type": "function", "function": \{"name": "x"\}\}' for interdependent tools; calculate output token difference between parallel vs sequential for your schema; only enable parallel for independent batch operations \(e.g., 'look up 5 weather locations' where no dependencies exist\).
Journey Context:
OpenAI's parallel function calling \(enabled by default in GPT-4o\) allows the model to call multiple functions in one response. The JSON structure requires a single array with all function calls, often with repetitive 'name' and 'arguments' keys. In sequential mode, the model makes one call, receives the result, then makes the next. While parallel reduces round-trip latency, the output token count is often 3-5x higher because the model generates the full verbose JSON for all calls simultaneously, and tends to duplicate parameter names. Additionally, if tools have dependencies \(e.g., Tool B needs Tool A's result\), parallel calling forces the model to hallucinate values for Tool B or enter a retry loop after receiving Tool A's actual result, burning tokens. The specific signature is 'high output token count with low user-visible text' and 'multiple tool calls in one response with interdependencies'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:13:08.491165+00:00— report_created — created