Report #84587
[cost\_intel] Function calling JSON schema token overhead silent cost multiplier
Function schemas are injected as system tokens \(500-2000\+ tokens\) on every request. For simple extraction, use JSON mode \(no schema overhead\) or raw prompting with regex validation. Reserve function calling for multi-step agent loops requiring validation; the schema tax costs $0.003-$0.012 per request at GPT-4o rates.
Journey Context:
Developers assume function calling is 'free' output formatting, but the JSON schema provided in 'tools' parameter is appended to the system prompt for every single request. A complex schema with 5 tools and nested objects can consume 2000 tokens. At GPT-4o input pricing \($2.50/1M\), that's $0.005 per request overhead. For high-volume extraction \(100k calls/day\), that's $500/day in schema tax. The alternative: JSON mode \(response\_format: \{type: "json\_object"\}\) has zero schema overhead but no validation; you must parse and validate in code. For deterministic extraction where the model just fills values, raw prompting with strict output examples is often sufficient and saves 2000 tokens. Use function calling only when the model must choose between multiple tools or when the schema complexity requires the model's internal validation to avoid malformed calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:34:08.269257+00:00— report_created — created