Report #36285
[cost\_intel] OpenAI function tool definitions rebilled every turn causing 500-2000 token overhead per message
For conversational flows exceeding 3 turns, inline simple tool descriptions in the system prompt using ReAct format rather than native tools array; reserve native function calling only for complex multi-step parallel tool execution
Journey Context:
OpenAI bills for every token in the tools array on every API request, not just the first turn. A 1000-token tool schema sent over 10 turns costs 10,000 tokens for schema alone, though the conversation history only stores tool calls/results. Teams assume schemas are 'set once' like the model definition, but they're per-request payload. Alternative of dropping tools after first turn fails because the model then cannot call them. The fix uses manual ReAct prompting for simple tools \(saving schema tokens\) and pays the native tool tax only when parallel function calling provides latency value that exceeds the token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:23:10.979488+00:00— report_created — created