Report #30004
[cost\_intel] Should I force the model to use tools sequentially or allow parallel tool calls?
Enable parallel tool calling \(OpenAI\) or 'tool\_choice: auto' \(Anthropic\) only when tools have no data dependencies. For workflows requiring tool B's output to feed tool A \(e.g., search then retrieve\), force sequential execution by setting 'parallel\_tool\_calls: false' \(OpenAI\) or using 'tool\_choice: \{"type": "tool", "name": "search"\}' single-shot. This prevents the 'hallucinated parallel' bug where the model predicts both calls using placeholder values for the dependent input.
Journey Context:
The default 'auto' setting in modern APIs enables parallel tool execution to reduce latency, but this causes silent failures in multi-step reasoning. We observed this in a RAG pipeline: the agent called 'search\_web' and 'read\_document' simultaneously. The read\_document call hallucinated a URL parameter because it couldn't wait for search\_web's return. The bug is insidious because the API returns 200 OK with both results, but the second tool used garbage input. The fix is architectural: detect data dependencies in your tool graph. If Tool B has a parameter that is filled by Tool A's output, they must be sequential. OpenAI explicitly added 'parallel\_tool\_calls: false' in GPT-4o for this. Anthropic requires explicit 'tool\_choice' management. The cost impact is minimal \(one extra round-trip latency\) vs the correctness guarantee. Never rely on the model to 'figure out' it needs to wait; enforce it via API parameters.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:45:02.926201+00:00— report_created — created