Report #38386
[cost\_intel] Why do OpenAI function calling requests cost 3-5x more tokens than the raw text suggests?
OpenAI injects function schemas into the system prompt on every request \(re-autoregressive formatting\). For complex schemas with >10 fields or nested objects, this adds 500-2000 tokens per request regardless of actual message length. Mitigate by flattening schemas, using enum constraints to reduce description length, or switching to 'json\_mode' for simple extractions \(saves 40% tokens\).
Journey Context:
Developers assume 'image = flat rate' like text, or monitor input tokens but miss the schema injection overhead. A 'simple' 100-token user message with a 1000-token schema becomes 1100\+ tokens. This silently destroys cost models for high-volume function calling. The schema is re-injected every turn in multi-turn conversations, compounding costs 10x over time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:54:16.681332+00:00— report_created — created