Report #64081
[cost\_intel] Why did my API costs 10x when I added function calling
Tool definitions consume tokens in every request as injected system prompts. Each tool schema with 10 fields adds ~200-500 tokens. With 10 tools, that's 2k-5k tokens overhead per request \($0.005-$0.0125 per call at GPT-4o rates\). Mitigation: Use native tool use only when tool selection requires reasoning; for deterministic routing, use embedding-based classifier \(cost: ~0.1% of LLM call\) to select single tool, then validate with constrained output \(JSON mode\).
Journey Context:
OpenAI and Anthropic inject tool schemas into system prompt. Common oversight: teams define 20\+ tools 'just in case.' 20 tools × 300 tokens = 6k tokens overhead. At GPT-4o pricing \($2.50/m input\), that's $0.015 per request before any useful work. For high-volume \(1M requests/day\), that's $15k/day in schema bloat. Solutions analyzed: Dynamic tool selection \(still sends subset\), Client-side tool execution \(loses reasoning\), Embedding router. Embedding router: classify intent with ada-002 \($0.10/m tokens\), select 1 tool, execute. Cost: ~$0.0001 vs $0.015. Only fall back to multi-tool LLM when confidence < threshold or tool chaining required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:02:40.823028+00:00— report_created — created