Report #29570
[cost\_intel] Parallel tool calls multiply context size by number of results inserted separately
Disable parallel tool calls when results are large \(\`parallel\_tool\_calls: false\`\); batch tool inputs into a single 'multi-tool' function with internal routing to amortize context overhead
Journey Context:
OpenAI's API supports calling up to 128 tools in parallel via \`parallel\_tool\_calls\`. When enabled, the model generates multiple \`tool\_calls\` in one response. Each result must be returned as a separate \`tool\` message in the history. If 10 tools are called and each returns 500 tokens, the input context for the next turn grows by 5000 tokens just from results, plus the overhead of 10 message objects. In sequential mode, you can truncate or summarize between calls. The fix is to disable parallel calls when context is constrained, or to design a single 'orchestrator' tool that accepts a list of operations, reducing the message structure overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:01:29.920814+00:00— report_created — created