Report #90684
[cost\_intel] Parallel tool calls \(default: true\) silently explode context size by forcing retention of all tool results \(5k-15k tokens\) in history, whereas sequential calls allow dropping intermediate results
Set \`parallel\_tool\_calls: false\` when tools are interdependent or when tool outputs are large \(e.g., database queries, document retrieval\); implement a "context compression" step that summarizes large tool outputs \(e.g., >500 tokens\) before the next LLM turn to prevent linear growth of the prompt.
Journey Context:
When \`parallel\_tool\_calls\` is true \(OpenAI default\), the model can generate multiple \`tool\_calls\` in one turn. If these tools fetch data \(e.g., "get\_user\_profile", "get\_order\_history", "get\_product\_catalog"\), each might return 1k-5k tokens of JSON. All of these results must be included in the next API call as part of the \`function\` role messages. With 3 parallel calls returning 3k tokens each, that's 9k tokens of new context in one turn. If you had called them sequentially, you could have processed the first result, summarized it, then called the second, keeping context flat. The trap is that parallel calls are "faster" \(wall-clock time\) but "expensive" \(tokens\). The common mistake is enabling parallel calls for tools that return large payloads, thinking it only affects latency. The fix is sequential calls with intermediate summarization for data-heavy tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:48:24.115047+00:00— report_created — created