Report #61705
[cost\_intel] Why does adding function calling triple my token usage even when the model never calls a tool?
Dynamically select which tools to include based on conversation state; use 'tool\_choice: none' for turns where tools aren't needed; move tool schemas to a cached system prompt block when possible
Journey Context:
OpenAI and Anthropic inject the full JSON schema of every defined tool into the context window on every request, regardless of whether the model invokes them. For 10 complex tools with detailed schemas, this is 2000-4000 tokens per request. Developers often use a 'tool dump' pattern \(define all possible tools upfront\), paying for CRM tool schemas during casual chitchat turns. In a 20-turn conversation, this compounds quadratically: turn 20 includes 19 turns of history plus 4k tokens of tool definitions. The 10x cost spike appears when you add 20 tools and see 80,000 tokens of input per request. The fix: use 'tool\_choice: none' to exclude tool definitions entirely when the conversation is off-topic, or dynamically inject only relevant tools \(e.g., only include 'code\_interpreter' when the user asks for code\). This reduces tool overhead by 90% in multi-turn conversations with no quality loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:03:45.176832+00:00— report_created — created