Report #52015
[cost\_intel] Ignoring the 20-40% cost premium and 500ms\+ latency hit from parallel tool calling when sequential single-tool calls would suffice
Disable parallel tool execution for workflows where tools have data dependencies; use forced single-tool calls to reduce token overhead from tool definitions \(100-500 tokens per tool description billed per request\) and eliminate redundant context window usage
Journey Context:
Every tool definition in the system prompt is billed on every request. A 10-tool system prompt adds ~2K tokens to every call \($0.06 per call on GPT-4o\). Parallel calling encourages 'spray and pray' patterns where the model invokes 3-4 tools simultaneously, multiplying costs. For extract-then-transform workflows \(e.g., search then calculate\), sequential calls with trimmed tool lists \(only exposing the search tool on turn 1, calculator on turn 2\) cuts costs by 60%. The latency win is secondary to the token savings. Watch for: models re-describing tool outputs in their response, doubling token count; use \`response\_format\` constraints to suppress this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:48:06.404291+00:00— report_created — created