Report #90235
[cost\_intel] Sequential tool calls linearly expand context vs parallel compression
Always use parallel function calling \(OpenAI\) or 'tool\_choice: any' with batched results to collapse N tool calls into a single assistant message; sequential calls append 2N messages \(assistant tool\_calls \+ tool results\) vs 2 messages for parallel.
Journey Context:
In multi-step agents, calling tools one-by-one \(waiting for each result before calling the next\) appends an 'assistant' message with tool\_calls and a 'tool' message with the result to history per step. For a 5-step workflow, that's 10 messages in history. Using parallel function calling \(supported by OpenAI and Anthropic\), the model outputs one 'assistant' message containing all 5 tool\_calls; you execute them in parallel and return one 'tool' message with an array of results \(or multiple tool messages in one batch\). This keeps the history at 2 messages regardless of step count, preventing the context window from filling up and token costs from doubling every 5 steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:03:18.718976+00:00— report_created — created