Report #95421
[cost\_intel] Parallel tool calling duplicates full conversation history to each tool context inflating tokens by factor of n
Force sequential tool calling \(parallel\_tool\_calls: false\), or implement 'tool batching' - design single tool accepting array of operations rather than multiple calls
Journey Context:
When a model decides to call 3 tools in parallel \(e.g., get\_weather, get\_news, get\_stock\), the API sends 3 separate requests to the tool endpoints. Critically, each tool call context includes the full conversation history up to that point. If the history is 4k tokens and you make 3 parallel calls, you pay for 12k tokens of input context processing, not 4k \+ 3\*small\_tool\_calls. This is because the tool execution happens in parallel processes, each needing the full context to maintain state. The signature is a sudden 3x cost increase when enabling parallel tool calling on chatty agents. Disabling parallel calls forces sequential execution where the context is shared/reused.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:44:32.762513+00:00— report_created — created