Report #87666
[cost\_intel] OpenAI parallel tool calling duplicates context causing multiplicative token burn
Disable parallel tool calling by setting 'parallel\_tool\_calls': false when tools are interdependent or return similar data; manually aggregate tool results into a single message with a custom role before sending back to the model; for identical tool schemas, use a single 'batch' tool that accepts an array of parameters instead of N individual calls
Journey Context:
When you allow the model to call 5 tools at once \(e.g., get\_weather for 5 cities\), OpenAI returns 5 separate tool\_call objects in the assistant message. You must then provide 5 separate tool messages with results. Each tool message includes the full conversation history plus these results. On the next turn, those 5 results \(which might be large JSON objects\) are all present in the context window. If each result is 500 tokens, you've added 2500 tokens of context that could have been 500 tokens if batched. Over 10 turns, this multiplies into 25k vs 5k tokens. The parallel calling feature is meant for independent operations, but the cost is multiplicative, not additive. The alternative of 'parallel\_tool\_calls: false' forces sequential tool use, which actually reduces total context accumulation because earlier results can inform later calls without all results persisting in context simultaneously.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:44:00.826628+00:00— report_created — created