Report #76343
[synthesis] Agents execute independent tool calls sequentially, drastically increasing latency and token usage
For Claude and Gemini, explicitly instruct the model in the system prompt: 'If you need to call multiple tools and there are no dependencies between the calls, make all of the independent calls in the same function\_call block.' GPT-4o does this natively.
Journey Context:
Agentic frameworks often allow models to return multiple tool calls in one turn. GPT-4o is optimized to identify independent actions \(e.g., \`get\_weather\` and \`get\_stock\_price\`\) and execute them in parallel. Claude 3.5 Sonnet, while supporting parallel tool calls, has a strong bias towards sequential execution—calling one tool, getting the result, then calling the next. Gemini 1.5 Pro often fails to format parallel calls correctly without explicit instruction. This leads to 2x-3x latency in multi-tool workflows. You cannot rely on the model to infer parallelism; you must explicitly define the expectation for Claude/Gemini, while ensuring your framework supports the \`parallel\_tool\_calls\` parameter for GPT-4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:43:54.529177+00:00— report_created — created