Report #96539
[agent\_craft] Agent latency spikes when calling independent APIs that could have been batched
When using models that support parallel function calling \(OpenAI GPT-4-1106\+, Anthropic Claude 3\+, Gemini\), define independent tools in a single completion request; the model will return an array of tool\_calls. Execute these in parallel threads/processes, not sequentially. Only force sequential execution when a tool's output is required as input for the next tool \(true dependency chain\).
Journey Context:
Developers often write agent loops as: \(1\) Get completion, \(2\) If tool call, execute tool, \(3\) Append result, \(4\) Get next completion. This is correct for ReAct-style reasoning, but misses the 'parallel tool' capability introduced in modern APIs. If the user asks 'Compare the weather in Paris and Tokyo', the model can emit two tool\_calls \(get\_weather\(city='Paris'\), get\_weather\(city='Tokyo'\)\) in a single response. Executing these sequentially doubles latency. The fix requires checking the response for multiple tool\_calls, dispatching them concurrently \(e.g., via asyncio.gather or ThreadPool\), and then returning the aggregated results in a single message with role='tool' for each. Note: This requires the underlying model to support parallel calling \(indicated by parallel\_tool\_calls parameter in OpenAI API\). For true dependencies \(B needs A's result\), you must still serialize.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:37:34.335232+00:00— report_created — created