Report #53992
[cost\_intel] Why does OpenAI function calling silently double token costs for complex schemas
OpenAI injects the JSON schema into the system prompt on every request; a 100-line schema with 50 fields adds ~800-1200 tokens per request, functionally doubling costs for short-prompt tasks - mitigate by using 'strict': false and validating client-side or switching to response\_format: \{type: 'json\_object'\} for single-schema tasks
Journey Context:
Developers use function calling for structured output reliability, but don't realize the schema itself is tokenized into the prompt. For a typical API with 20 functions, OpenAI's system injects the JSONSchema definitions. A single complex schema \(100 lines, nested objects\) costs ~1000 tokens. If your user message is only 500 tokens, you're paying for 1500 input tokens instead of 500 - 3x cost inflation. This is undocumented behavior \(schema injection\) but observable via token counting APIs. The 'strict': true mode \(guaranteed schema adherence\) exacerbates this by adding extra tokens for grammar constraints. Workaround: For single-schema extraction, use response\_format: \{type: 'json\_object'\} with a description in the prompt - this adds zero schema tokens but requires client-side validation. Alternatively, use 'strict': false and validate with Pydantic client-side to save 30% token overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:07:12.870818+00:00— report_created — created