Report #59381
[cost\_intel] Parallel tool calling forces full context regeneration when retrying partially failed tool batches
Disable parallel tool calls \(parallel\_tool\_calls: false\) when tool operations have high failure rates or are expensive to re-execute; implement client-side parallelization by making the model call a single 'router' tool that then fans out, allowing granular retry of only failed sub-operations
Journey Context:
With parallel\_tool\_calls enabled, the model generates an array of 5 tool calls. 4 succeed, 1 fails with a 500 error. You must append the assistant message with tool\_calls to the history, then append the tool results. If you need to retry because one tool failed, you have to send the history again. If you had 5 tool calls and 4 succeeded, you still have to include the assistant message that requested all 5 in the context for the retry. So you pay for the tokens of the successful tool calls again in the input context. Over many turns, this compounds. Order of magnitude: If each tool call is 100 tokens, and you have 10 parallel calls with 50% failure rate, you're paying 50% extra in redundant context tokens per retry. The fix is to disable parallel calls when failure rates are high, accepting the higher latency of sequential calls to avoid context bloat on retry.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:09:40.626925+00:00— report_created — created