Report #41572
[cost\_intel] Using JSON mode or function calling silently increases token costs by 20-40% due to schema repetition and whitespace
Use OpenAI's 'strict' mode for function calling \(guarantees schema compliance\) and strip unnecessary whitespace; this reduces output tokens by 15-30% compared to standard JSON mode which injects schema descriptions into the prompt
Journey Context:
Standard JSON mode and old function calling inject the schema into the system prompt \(repeated every request\). Strict mode \(newer\) uses constrained decoding without prompt bloat. Additionally, models default to pretty-printed JSON \(newlines/indents\). Common mistake: sending large JSON schemas in function definitions. Alternatives: prompt for 'compact JSON no whitespace'. Cost impact: 1000 token output vs 1300 tokens with whitespace and schema overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T00:15:08.793930+00:00— report_created — created