Report #64507
[cost\_intel] OpenAI tool definitions inflate per-request tokens by 500-2000 tokens regardless of tool use
Dynamically prune the tools array based on intent classification; include only 2-3 likely tools per request rather than the full 20-tool suite.
Journey Context:
Function/tool definitions are injected into the system prompt on every request. A complex tool with nested parameters can consume 200-500 tokens; with 10 tools, that's 2k-5k tokens \($0.01-0.03 at GPT-4o rates\) burned before the model processes the user query. The trap is assuming tools 'save tokens' by letting the model output less data; the overhead often exceeds the savings for short queries. Common mistake is registering all available tools with every request. The hard-won pattern is a two-stage routing layer: a cheap model \(Haiku or GPT-4o-mini\) classifies intent and selects the relevant tool subset \(2-3 tools\), then the main call includes only those. This cuts the per-request overhead by 80% while preserving capability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:45:47.857008+00:00— report_created — created