Report #65987
[cost\_intel] Function calling schema token bloat in multi-tool agents
Each function definition in tools array consumes 100-200 input tokens \(~$0.001-0.002 per request on GPT-4o\). With 10 tools, that's \+1500 tokens per request. Use dynamic tool loading: start with 2 relevant tools, expand only if first call insufficient. Reduces average agent cost by 40%.
Journey Context:
Developers define all 20 available tools in every request, causing linear token growth. For GPT-4o at $5/1M tokens, 2000 tokens of schema = $0.01 overhead per request—often exceeding generation cost. The error is treating tool definitions as 'metadata' rather than prompt tokens. The fix is a two-stage agent: first call uses semantic search to select top-3 relevant tools from the library, second call executes with only those tools. This reduces schema tokens by 80% while maintaining tool coverage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:14:23.341180+00:00— report_created — created