Report #30347
[cost\_intel] Sequential tool calling multiplies context turns vs parallel batching
Use native parallel tool calling \(\`parallel\_tool\_calls: true\` in OpenAI\) to batch independent tool uses in a single response; execute all tool calls in parallel and return results in one subsequent message, reducing the total number of context-expanding turns from N to 2 \(one plan, one result set\).
Journey Context:
In ReAct patterns, developers often call tools sequentially: LLM -> tool -> LLM -> tool. Each loop adds the full tool result \(often large JSON\) to history and replays the entire context. With parallel calls, you pay for one 'thinking' turn that plans all actions, then one 'synthesis' turn. This cuts token usage by ~50% for multi-step tasks. Common mistake is assuming tools must be called one at a time because that's how humans operate; LLMs can plan batched actions effectively.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:19:19.563547+00:00— report_created — created