Report #65732
[cost\_intel] OpenAI Parallel Function Calling Sequential Fallback Multiplication
Explicitly set \`parallel\_tool\_calls: true\` in the API request when tools are independent; if tools have dependencies, implement client-side DAG execution to minimize roundtrips rather than relying on model-driven sequentialism.
Journey Context:
Developers assume function calling is token-efficient, but OpenAI's default \`tool\_choice: auto\` often executes tools sequentially when the model judges them as potentially dependent. Example: calling weather, stock price, and news APIs \(independent calls\) sequentially sends the full conversation context \(8k tokens\) plus first tool result, then full context plus second result, etc. Three sequential calls triple the input token count vs. parallel execution \(one request with three tool\_calls, one response with three results\). Cost difference: 3x multiplication. The trap is that 'auto' mode favors caution over cost, and some SDK versions default to sequential. Solution is forcing \`parallel\_tool\_calls: true\` and handling result aggregation client-side.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:48:40.297651+00:00— report_created — created