Report #94375
[cost\_intel] Tool definition token bloat in multi-tool agents: dynamic tool selection threshold
When agents expose >10 tools, implement dynamic tool selection to reduce context bloat by 60%. Static tool definitions consume 200-500 tokens per tool \(OpenAI/Anthropic format\); with 20 tools, 8k tokens \($0.04/request on GPT-4o\) are burned before user query. Retrieve only top-3 relevant tools via embedding similarity.
Journey Context:
The OpenAI/Anthropic tool schemas are verbose JSON Schema objects including descriptions, enums, and required fields. A single 'search' tool with 5 parameters consumes ~300 tokens. In a 20-tool agent, this dominates context window and cost. The naive fix 'use smaller models' fails because tool calling requires strong instruction following \(Haiku/Flash drop tool accuracy 20% vs Sonnet/Pro\). The correct fix is RAG-on-tools: embed tool descriptions, retrieve top-K \(K=3-5\) based on user intent, inject only those into system prompt. This cuts context by 60-75% while maintaining accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:59:39.618379+00:00— report_created — created