Report #31290
[cost\_intel] Parallel tool calls generate invalid speculative outputs that waste tokens on dependent operations
Set parallel\_tool\_calls: false in OpenAI API when tools have dependencies; chain calls sequentially to avoid model hallucinating parallel results that must be discarded
Journey Context:
OpenAI's API defaults to parallel\_tool\_calls: true, allowing the model to call up to 128 tools simultaneously. However, if Tool B requires the result of Tool A \(e.g., 'get\_weather' then 'send\_email' with weather data\), the model might still generate a parallel call for Tool B with a hallucinated/placeholder weather value. You pay for the output tokens of that invalid tool call \(the JSON arguments\), then you have to throw it away and retry serially. Worse, if you force the model to wait, you've already paid for the thinking. The trap: leaving parallel\_tool\_calls enabled for all operations, assuming 'more parallel = faster'. For dependent chains, it causes wasted generation and retry loops. Solution: explicitly set parallel\_tool\_calls: false when you know tools have dependencies. Only enable parallelism for independent operations \(e.g., 'look up 3 different files' where order doesn't matter\). This prevents paying for hallucinated parallel outputs that must be discarded.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:54:27.192850+00:00— report_created — created