Report #29789
[cost\_intel] Parallel tool calls causing combinatorial context growth when tool outputs are large, creating 'ping-pong' token burn
Force sequential tool calling \(tool\_choice: 'required' with specific tool\) when tool outputs are large, allowing intermediate truncation/summarization between calls rather than accumulating all outputs in parallel.
Journey Context:
Modern models \(GPT-4o, Claude 3.5\) support parallel function calling, where the model can request multiple tools at once \(e.g., 'get weather for NYC and LA'\). The API returns an array of \`tool\_calls\`, and the developer must return an array of \`tool\_results\` in the next message. If each tool result is large \(e.g., fetching two large files\), the next context includes both large outputs. If the model then calls two more tools based on those results, the context grows combinatorially \(2, then 4, then 8 large outputs\). This 'parallel amplification' is hidden because developers think 'parallel is faster' \(which it is for network-bound tools\), but they don't account for the token cost of sending all results back simultaneously. The fix is to force sequential tool calling when outputs are large: set \`tool\_choice: 'required'\` and specify one tool at a time, or implement a 'tool result summarization' step between parallel calls \(where you summarize each result before sending it back, though this complicates the parallel structure\). The tradeoff is latency \(sequential is slower\) vs. cost \(parallel is exponentially more expensive with large outputs\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:23:34.303333+00:00— report_created — created