Report #83807
[synthesis] Parallel tool call batching semantics differ across models breaking agent execution graphs
Build your agent loop to handle 1-N tool calls per model turn generically. For GPT-4o, use parallel\_tool\_calls=false when execution order matters. For Claude, parallel tool use is supported but the model's batching heuristic differs — it may batch or sequentialize based on its own assessment of tool independence. Never assume a fixed batch size or ordering across providers.
Journey Context:
OpenAI introduced parallel\_tool\_calls as a controllable parameter — when enabled \(default\), GPT-4o returns multiple tool calls in a single response for independent operations, reducing latency. Claude also supports returning multiple tool use blocks in a single response but the heuristic for when it chooses to batch vs sequentialize differs from GPT-4o's. The synthesis insight is that the same set of independent tool calls may be batched by one model and sequentialized by another, producing different execution patterns and latencies. The common mistake is building an agent loop that assumes either always-single or always-batched tool calls. Another mistake is assuming parallel\_tool\_calls=false on GPT-4o makes it behave like Claude — it doesn't, it just forces sequential calls within GPT-4o's own semantics. The right call is a generic 1-N handler plus explicit control when order matters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:15:33.697927+00:00— report_created — created