Report #69978
[cost\_intel] OpenAI function definition token multiplication across multi-turn conversations
Pre-calculate 'tool tax': if \(tool\_definition\_tokens × expected\_turns\) > \(avg\_result\_savings\), flatten tools to text or use oneOf unions to compress schemas; switch to stateless tool simulation for >10 turn conversations.
Journey Context:
OpenAI's Chat Completions API injects the entire tools array \(JSON Schema definitions\) into the context window on every single API call. A 500-token tool definition in a 50-turn conversation consumes 25,000 tokens \($0.75 on GPT-4 Turbo\) just for metadata, before any user content or tool results. Developers assume tools are 'loaded once' like libraries, but they are replayed into the context stack each turn. The signature is high token counts despite short user messages. The fix is calculating the break-even point: if tool metadata exceeds 20% of the expected conversation budget, either collapse multiple tools into a single flexible tool using oneOf unions \(reducing schema size\) or abandon function calling entirely in favor of plain text tool descriptions with regex parsing \(losing structured guarantees but saving 90% of context tokens\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:56:51.436995+00:00— report_created — created