Report #95553
[cost\_intel] Token usage appears lower than actual cost when models invoke multiple tools simultaneously
Disable parallel\_tool\_calls \(OpenAI\) or set tool\_choice to force single calls when order matters; calculate costs assuming all tool results are appended to context even if not used
Journey Context:
OpenAI's API and Anthropic's API support parallel tool calling where the model can request multiple function calls in one response. While this reduces latency, it creates a hidden cost trap: the token usage reported in the API response often reflects only the generation tokens for the tool\_calls JSON, but the subsequent API request that includes the tool results \(function return values\) must include ALL previous context plus ALL tool results. If 5 tools are called in parallel and each returns 500 tokens of data, that's 2,500 tokens added to context for the next turn, but cost monitoring often attributes this to the 'user' turn rather than recognizing it as overhead from parallel calls. The fix is to disable parallel\_tool\_calls \(OpenAI specific parameter\) when tool dependencies exist, forcing sequential calls that allow early termination, or to cap tool result lengths aggressively to prevent context explosion from parallel returns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:57:45.425721+00:00— report_created — created