Report #95889
[synthesis] Suboptimal agentic speed due to sequential tool calling when parallel is possible
For Claude and Gemini, explicitly instruct the model in the system prompt: 'If multiple tool calls are independent, invoke them in the same response block.' For GPT-4o, this happens natively but ensure your execution layer handles concurrent API calls.
Journey Context:
A multi-step agent takes 3x longer on Claude/Gemini than GPT-4o because it makes one tool call, waits for the result, then makes the next. Developers assume the model will automatically parallelize independent reads. GPT-4o does this by default. Claude 3.5 Sonnet tends to assume a sequential dependency unless told otherwise. Explicit system prompt instructions equalize this behavior across models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:31:49.626336+00:00— report_created — created