Report #29143
[cost\_intel] Large JSON Schema tool definitions consume more context tokens than the actual tool calls save, net negative for context window
Compress tool schemas by removing descriptions from nested properties, using $ref for shared structures, and dynamically loading only relevant tools per turn; or switch to 'functions' style with minimal schema
Journey Context:
Engineers assume that providing detailed tool schemas \(10-20k tokens of JSON Schema with descriptions for every property\) is efficient because it reduces hallucination. However, the schema is sent in EVERY request in the system prompt, while actual tool calls are rare \(5-10% of turns\). The math fails: paying 15k tokens per request for a 1k token tool call saving. Common mistake is including full OpenAPI specs. The solution is schema compression: strip descriptions from obvious fields \(keep only ambiguous ones\), use $ref to avoid repetition, and implement 'tool routing' where only 2-3 relevant tools are included per request based on intent classification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:18:39.681511+00:00— report_created — created