Report #95715
[cost\_intel] OpenAI Function Calling adds 15-20% token overhead vs JSON mode and 40% vs raw text, but the silent cost is 'explanation tokens' before structured output
Force strict tool schema compliance via 'strict': true \(OpenAI\) or use constrained decoding libraries \(Outlines, Guidance\) to eliminate 'explanation' tokens; for simple extraction, use raw text with delimiters and regex parsing rather than Function Calling, reducing tokens by 30-40% and latency by 200-500ms.
Journey Context:
Engineers assume Function Calling overhead is negligible schema wrapping, but production tracing shows models emit reasoning text \('Let me analyze the user's request to determine the correct parameters...'\) before structured output unless explicitly suppressed. With 'strict': true and 'temperature': 0, this drops but schema overhead remains. Raw text extraction with careful prompting \('Respond only with JSON: \{...\}'\) then regex extraction avoids the function calling parser tax. The tradeoff: Function Calling provides automatic schema validation and handles edge cases \(nulls, enums\) robustly; raw text fails silently on malformed outputs. Use raw text only for high-volume, simple schema, high-latency-sensitive extraction where 99% accuracy is acceptable and you can afford regex cleanup.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:14:29.181299+00:00— report_created — created