Report #56218
[cost\_intel] Parallel tool call overhead inflating prompt tokens by 30-50% per additional tool due to JSON schema repetition
Consolidate tools into single 'router' tool with 'action' enum and discriminated union; minimize description lengths to <50 chars
Journey Context:
When defining multiple tools for parallel function calling, each tool's JSON schema is embedded verbatim into the system prompt. A 5-tool suite with OpenAPI-style descriptions can consume 8,000-12,000 tokens before user input. On GPT-4o at $5/1M tokens, that's $0.04-0.06 per request just for tool definitions. When tools are actually called, the results often add fewer tokens than the schema overhead. By consolidating into a single 'universal\_tool' with a discriminated union \(oneOf\) and terse descriptions, you pay the schema cost once, reducing 5-tool overhead from 10k tokens to 2k tokens, saving $0.04 per request or 80% on tool definition costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:51:23.078065+00:00— report_created — created