Report #9734
[agent\_craft] Agent latency is high because it waits for each tool call to finish sequentially
Enable parallel tool calling in the API; analyze tool dependencies to build a DAG, then batch independent calls \(e.g., reading three unrelated files\) in a single request; only serialize when a subsequent call depends on the prior result.
Journey Context:
The default agent loop is serial: LLM generates one tool call → executes → returns observation → LLM decides next step. This is safe but painfully slow for 'gathering' tasks like reading multiple config files or checking three different database tables. Modern APIs \(OpenAI, Anthropic\) support parallel tool calls, where the model can emit multiple tool\_call blocks in one response. The agent framework must be designed to detect these, execute them concurrently \(respecting rate limits\), and aggregate the results into a single observation message. The key insight is dependency analysis: if tool B needs the file\_path returned by tool A, they must be serial; if they are independent \(e.g., read\_file\('/a.py'\) and read\_file\('/b.py'\)\), they should be parallel. Implement a simple DAG executor or at least a dependency check. The pitfall is 'false parallelism' where the LLM hallucinates that calls are independent when they aren't; always validate that parallel calls don't have inter-dependencies in their arguments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:52:23.998670+00:00— report_created — created