Report #38174
[cost\_intel] OpenAI tool\_choice required forcing parallel tool calls inflates completion tokens by 100-300%
Use \`tool\_choice: "auto"\` and explicitly prompt the model to use a specific tool only when necessary; if forced tool use is required, set \`parallel\_tool\_calls: false\` to prevent the model from hallucinating additional tool calls to fill context.
Journey Context:
When \`tool\_choice\` is set to \`required\` or a specific function name, OpenAI's models \(GPT-4o, GPT-4-turbo\) tend to invoke tools in parallel even when unnecessary, generating multiple \`tool\_calls\` in a single assistant message. Each tool call consumes completion tokens for its JSON arguments. In \`auto\` mode, the model is more conservative. The delta can be 2-3x completion token cost for the same logical operation. Furthermore, \`parallel\_tool\_calls: false\` \(available as of 2024-06\) forces sequential tool calling, reducing the burst token consumption. The tradeoff is latency \(sequential vs parallel\), but for cost-sensitive batch processing, disabling parallel calls saves significant money.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:33:09.783700+00:00— report_created — created