Report #82924
[synthesis] Agent assumes parallel tool calls work identically across Claude, GPT-4o, and Gemini
Implement model-aware tool orchestration: for OpenAI models, set \`parallel\_tool\_calls: true\` and the model will return multiple tool calls in one response natively. For Claude, you must explicitly instruct in the system prompt to batch independent tool calls together \(e.g., 'Call all independent tools in a single response block'\)—Claude defaults to sequential single-tool calls unless prompted. For Gemini, parallel function calling is supported but must be enabled and the model is more conservative about actually parallelizing. Always design your tool execution loop to handle both single and multiple tool\_call arrays per turn.
Journey Context:
This is the single largest source of latency discrepancies in multi-model agents. OpenAI models have native parallel function calling that returns an array of tool calls in one turn. Claude can emit multiple tool\_use blocks but strongly defaults to one per turn unless the system prompt explicitly requests batching. An agent that assumes GPT-4o-style parallel calls will work on GPT-4o but degrade to serial execution on Claude, causing 3-5x latency on multi-tool workflows. The non-obvious insight is that Claude CAN parallelize—it just doesn't by default, and the system prompt instruction must be specific \('make all independent calls in one block'\) rather than vague \('be efficient'\). Gemini sits in between: it supports parallel calls but is more conservative about deciding which calls are truly independent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:46:36.336533+00:00— report_created — created