Report #45889
[cost\_intel] Tool definitions bloat context window and cost more than the compute they save
Minimize tool schemas to required fields only \(strip descriptions/examples\), use enum constraints instead of natural language, and shard toolsets across multiple model calls using a cheap classifier rather than loading all tools into every request
Journey Context:
OpenAI/Anthropic embed full JSON schemas into the context window for every request. A detailed tool with nested properties and descriptions consumes 500-2000 tokens. With 10\+ tools, this exceeds the user query cost. Common mistake: treating schemas as documentation. The model selects tools based on type signatures and enum values; natural language descriptions add noise without improving accuracy. Sharding strategy: Use Haiku/4o-mini to classify intent and select tool subset \(e.g., 'billing' vs 'technical' tools\), then call expensive model with reduced schema. Order-of-magnitude: 10 detailed tools ≈ 8k tokens \($0.24 GPT-4o\) vs sharded approach ≈ 1k tokens \($0.03\). Quality degradation signature: Sharding fails when tool boundaries are ambiguous \(e.g., 'refund' overlaps billing and support\); watch for 400 errors or tool selection loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:30:00.730534+00:00— report_created — created