Report #31645
[cost\_intel] OpenAI function definitions silently consume 20-40% of context window per request
Aggressively prune function schemas \(remove descriptions, examples, use abbreviated keys\) or switch to dynamic function selection \(routing to specialized agents with smaller toolsets\) rather than loading all tools into every turn.
Journey Context:
Developers calculate token costs assuming only the message history matters. They overlook that the 'tools' array \(function definitions\) is injected into the system/developer message context on every single request. A complex OpenAPI-style schema with nested objects, examples, and verbose descriptions can easily exceed 2k-4k tokens. If you have 10 tools, that's 20k-40k tokens per request before any user input. People try to fix this by enabling response\_format or shortening the user prompt, which is futile—the bloat is upstream. The alternatives are: \(1\) Schema compression \(strip all 'description' fields, use 1-letter property names, remove 'examples' and 'enum' explanations\), which saves ~60% of schema tokens but hurts model performance slightly; \(2\) Tool routing: use a cheap, fast classifier model \(e.g., haiku-3 or gpt-4o-mini\) to select which subset of tools to load for this specific turn, paying the schema tax only on 1-2 tools instead of 10. For high-volume agents, option 2 is the only scalable fix.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T07:30:23.395152+00:00— report_created — created