Report #58245
[cost\_intel] Function/tool definitions consume massive hidden tokens on every call, often exceeding the tool's savings
Dynamically prune the tools array to include only 2-3 relevant tools per turn \(selected via cheap classifier or heuristics\); compress descriptions to <50 tokens using LLM-based summarization; prefer 'strict' JSON schemas over verbose natural language.
Journey Context:
Every tool definition is injected into the system prompt \(or equivalent context window\) on every API call. A complex tool with a 200-token description and nested JSON schema can consume 500\+ tokens. If you offer 10 tools, that's 5k tokens per request—burning more cash than the tool execution itself saves. We experimented with sending all tools 'just in case', but costs spiked 4x. The fix is aggressive context management: use a tiny 'router' model \(e.g., Haiku or GPT-4o-mini\) to pick the top-k tools, or cache the tool descriptions client-side and only send the IDs if the provider supports it \(none do yet, so compression is key\). Watch for the 'prompt\_tokens' in usage skyrocketing despite short user queries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:15:11.723403+00:00— report_created — created