Report #69834
[cost\_intel] Parallel tool calls create un-truncatable context bloat versus sequential tool use
Disable parallel tool calling \(parallel\_tool\_calls: false\) when tools are interdependent or context is constrained; execute tools sequentially and summarize results between steps. Only enable parallel calls for independent, non-overlapping data fetches where latency matters more than cost.
Journey Context:
OpenAI's API allows parallel tool calls \(default on for supported models\). When a model calls 5 tools at once, all 5 results are returned in the next request context. This forces the subsequent turn to include all results, even if large. In sequential mode, you can summarize or truncate the result of tool 1 before sending tool 2, keeping context small. The cost difference is massive: parallel calls for 5 large documents \(10k tokens each\) = 50k tokens in context immediately. Sequential allows summarizing each to 500 tokens = 2.5k context. The trap is assuming parallel is always faster; for long-context workflows, it creates compounding context that exceeds limits or triggers expensive truncation logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:42:04.794689+00:00— report_created — created