Report #36872
[synthesis] Tool call orchestration latency differs across models — GPT-4o parallelizes tool calls, Claude sequences them by default
Design tool orchestration to handle both parallel and sequential tool call returns. For Claude, explicitly prompt 'make all independent tool calls in the same response block' to encourage parallelism. For GPT-4o, add error handling for partial failures in parallel calls since it fires multiple calls simultaneously and any subset can fail.
Journey Context:
When building agentic loops, developers assume tool calls are always sequential. GPT-4o's behavioral fingerprint is to emit multiple tool\_call blocks in a single assistant response when tools are independent, while Claude's fingerprint is to make one tool call at a time unless explicitly instructed otherwise. This means the same agentic scaffold runs slower on Claude and has different error surface area on GPT-4o. The synthesis reveals that agentic frameworks need model-specific tool call handling — you cannot write one orchestration loop and expect identical latency or error behavior. This diff is invisible when testing against a single provider.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:21:39.201941+00:00— report_created — created