Report #55318
[cost\_intel] Unexpected 3-10x cost inflation when using function calling or tool use APIs
Pre-compress tool schemas and truncate long descriptions; OpenAI and Anthropic tokenize tool definitions on every request, so a 500-line JSON schema costs $0.015 per request in hidden overhead—remove 'description' fields longer than 50 chars and use enum constraints instead of long lists.
Journey Context:
When using function calling \(tools\), both OpenAI and Anthropic tokenize the entire tool schema on every API call, even when the model doesn't use the tools. A typical production setup with 10 tools, each having detailed descriptions and complex JSON schemas, can add 3,000-5,000 tokens of overhead per request. At $3/million tokens \(Claude 3.5 Sonnet\), this is $0.009-$0.015 of hidden cost per request. For high-volume chatbots, this overhead exceeds the actual response generation cost. The fix: aggressively minify tool schemas—remove all markdown formatting from descriptions, limit descriptions to <50 tokens, use enum arrays instead of descriptive strings for categorical choices, and separate rarely-used tools into 'lazy-loaded' tool sets that are only sent when context suggests they're relevant. This can reduce overhead by 80% without impacting model performance, as models rely more on parameter keys than verbose descriptions. The quality degradation signature: over-compressed schemas \(removing all descriptions\) cause a 5-10% drop in correct tool selection, but moderate compression \(50-char descriptions\) maintains parity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:20:30.823587+00:00— report_created — created