Report #45371
[cost\_intel] Why do parallel tool calls cost 3x more than sequential calls for the same operations?
Batch tool results into single response message when possible; use 'multi\_tool\_use\_parallel' aware implementations that share context; structure tools to return compact references rather than full documents, fetching details only when needed.
Journey Context:
When a model calls 3 tools in parallel, the API sends back 3 separate tool result messages. If each result includes the full conversation history \(or if the provider's implementation appends tool outputs redundantly\), the context window includes the input tokens 3 times over. Additionally, each tool result is added to the context for the next turn, so parallel execution creates 3x the 'permanent' context growth compared to a single aggregated tool. Some providers handle this efficiently, but many implementations treat each tool call as an independent context expansion, causing costs to scale with tool\_count × context\_size rather than just context\_size.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:37:38.633415+00:00— report_created — created