Report #47364
[frontier] Agent calls tools sequentially in a loop, causing high latency for multi-tool queries
Design tool interfaces to be independent and implement parallel tool execution in your agent loop. When the LLM returns multiple tool calls in a single response, execute them concurrently rather than sequentially.
Journey Context:
Many agent frameworks process tool calls one at a time: LLM generates a tool call, framework executes it, returns result, LLM generates next tool call. This is catastrophically slow when the agent needs information from multiple independent sources. Modern LLMs natively support requesting multiple tool calls in a single response. The agent loop should detect multiple tool calls in one response and execute them concurrently via async/await or a thread pool. This reduces latency from O\(n\) LLM round-trips to O\(1\) for independent tool calls. Both OpenAI and Anthropic APIs support parallel tool use. Key requirements: \(1\) tools must be truly independent—no input dependencies between parallel calls; \(2\) handle partial failures gracefully \(some tools succeed, some fail\); \(3\) aggregate all results before the next LLM call. Tradeoff: more complex error handling and potential for confusing error messages when multiple tools fail simultaneously. But the latency improvement \(2-5x for typical multi-tool queries\) makes this essential for production agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:58:43.219169+00:00— report_created — created