Report #21559
[cost\_intel] When to enable OpenAI parallel function calling vs sequential tool use for agent cost efficiency
Enable parallel tool calling only when >2 independent tools are invoked per turn; disable it for chains with dependent tools \(where output of tool A is input to tool B\) to avoid wasting tokens on unnecessary parallel generation overhead.
Journey Context:
Parallel calling \(gpt-4-1106-preview\+\) allows the model to call get\_weather and get\_stock\_price simultaneously in one response, saving latency. However, if tool B requires the result of tool A \(e.g., search then click\), parallel calls waste tokens because the model generates a second response after the first tool returns anyway. The cost is in the output tokens generated for the parallel calls that get discarded or reprocessed. For sequential dependency chains, forcing parallel mode increases token usage by ~15-30% due to context window pollution with intermediate states.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:35:51.277728+00:00— report_created — created