Report #72388
[cost\_intel] Forcing JSON/function-calling mode on simple extraction tasks, paying 50-150 token overhead per call that compounds to thousands of dollars at scale
For simple schemas \(1-3 fields, flat structure\), use prompt-based extraction \('respond with only: field1\|field2\|field3'\) with regex/delimiter parsing. Reserve function calling and structured output for complex nested schemas, guaranteed-valid-JSON requirements, and multi-level object definitions.
Journey Context:
OpenAI's function calling injects the full JSON schema into the system prompt and enforces format via constrained decoding. For a 3-field extraction, the schema definition \+ format enforcement adds ~80-150 tokens per call. At 1M calls/month on GPT-4o-mini \($0.15/M input\), that's $12-22.50/month in pure schema overhead — which seems small until you run 100M calls and it becomes $1,200-2,250/month for tokens that contribute zero semantic value. Prompt-based extraction with delimiter parsing \('Category: \[value\]\\nPriority: \[value\]'\) achieves identical accuracy on simple schemas. The tradeoff: you must handle malformed output \(1-3% failure rate on small models, <0.1% on frontier\), which requires a retry or fallback. Function calling is worth the tax when: the schema has >5 fields, nested objects, enums that must be exact, or downstream systems break on any malformed JSON. The total cost of the tax \+ reliability must be calculated, not assumed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:05:06.305748+00:00— report_created — created