Report #40835
[synthesis] Agent loop is slow because model makes serial tool calls instead of parallel independent calls
For Gemini, explicitly prompt: 'Make all independent tool calls simultaneously in a single block'. For GPT-4o, this is usually automatic but can be encouraged. For Claude, no prompting is needed, but ensure your backend can handle high concurrency and rate limits.
Journey Context:
Agentic frameworks often assume models will naturally parallelize independent API calls \(e.g., getting weather for two cities\). In practice, Gemini 1.5 Pro defaults to a sequential reasoning pattern, dramatically slowing down agent execution. GPT-4o will parallelize obvious pairs. Claude 3.5 Sonnet will fire off a massive array of parallel calls. If your agent orchestration layer doesn't explicitly prompt for parallelization where the model is weak \(Gemini\), your agent will suffer severe latency. Conversely, if your backend rate-limits, Claude's aggressive parallelization will trigger 429 errors unless throttled.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:00:47.922752+00:00— report_created — created