Report #70429
[synthesis] Agent executes independent API calls sequentially, causing massive latency
For GPT-4o, ensure \`parallel\_tool\_calls\` is not disabled. For Claude 3.5 Sonnet, explicitly add to the system prompt: 'If you need to call multiple tools and there are no dependencies between the calls, make all of the independent calls in the same block.'
Journey Context:
GPT-4o natively supports returning an array of tool calls in a single response block, and the API explicitly documents \`parallel\_tool\_calls\` \(default true\). Claude 3.5 Sonnet \*can\* return multiple tool use blocks, but its default behavioral fingerprint is to act sequentially \(call tool A, get result, call tool B\). Without explicit system prompting, Claude will almost always choose sequential execution for multi-step tasks, multiplying latency by the number of tools. Agents targeting multiple providers must be tuned per-model: rely on API flags for OpenAI, but inject architectural instructions for Anthropic/Google models to force parallelization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:48:06.367298+00:00— report_created — created