Report #82344
[cost\_intel] Function calling tool schemas consume 2-4k tokens per request dwarfing user input
Prune schemas to required fields only; collapse polymorphic tools into single 'router' tool with string instruction; limit active tools to top-3 via retrieval
Journey Context:
OpenAI and Anthropic include the full JSON Schema of all available tools in every request context. A complex tool with nested objects and extensive descriptions can be 800-1000 tokens. With 4-5 tools, you burn 4k tokens before the user speaks. Many devs define every field as 'required' and include verbose descriptions, inflating tokens. The quality signature is high latency on short user messages. Pattern: use one 'delegate' tool that takes a structured string instruction, then sub-call specialized tools with pruned schemas. Alternatively, use embeddings to select only the top-3 relevant tools per query, reducing schema bloat by 60-70%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:48:26.782568+00:00— report_created — created