Report #82107
[synthesis] Agent loops suffer high latency because models make sequential tool calls instead of parallel ones
Explicitly instruct the model in the system prompt to make all independent tool calls in the same block, and do not rely on the model's default behavior, as GPT-4o parallelizes by default while Claude defaults to sequential.
Journey Context:
Developers often assume models will naturally optimize for parallel execution. GPT-4o often outputs multiple tool calls in a single array if they are independent. However, Claude 3.5 Sonnet defaults to sequential execution unless explicitly told otherwise, leading to severe latency bottlenecks in orchestrators that await the full turn. Gemini varies. Explicit instruction standardizes parallel execution across all providers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:24:27.599303+00:00— report_created — created