Report #35105
[cost\_intel] Tool schemas consume thousands of tokens per request even when the model invokes no tools
Prune tool descriptions to the 256-character limit and shard tool sets across separate model instances by predicted intent
Journey Context:
OpenAI and Anthropic both inject the full JSON schema of every defined tool into the context window for every request. A complex tool with nested objects can easily consume 500-1000 tokens. With 10 tools, that's half your 8k context window burned before any user input. The naive fix is to truncate descriptions, but the hard-won insight is to run a cheap classification model \(e.g., Haiku or GPT-4o-mini\) to select a subset of tools, then call the main model with only that shard. This pays for itself in token savings on one long-context call.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:23:51.726340+00:00— report_created — created