Report #46852
[cost\_intel] Where is the schema complexity cliff between GPT-4o-mini and Claude 3.5 Sonnet for structured extraction?
Use GPT-4o-mini for flat extraction tasks \(<5 fields, no nested objects, no arrays\) where it achieves 98% accuracy at $0.15 per 1M input tokens—1/50th the cost of Sonnet. Switch to Claude 3.5 Sonnet when schemas contain nested arrays, conditional fields \(oneOf/anyOf\), or require reasoning to fill implicit nulls; mini hallucinates or returns malformed JSON 30-40% of the time on deep nesting, while Sonnet maintains 95%\+ validity.
Journey Context:
Teams assume structured output mode eliminates model capability differences. This is false—GPT-4o-mini's underlying reasoning is shallower. The failure mode isn't random: mini fails specifically on 'structural dependencies' where field B's validity depends on field A's value \(e.g., 'if type is 'corporate', tax\_id is required; if 'individual', ssn is required'\). It also fails on 'variable cardinality' \(arrays of objects where the count depends on content\). These require implicit reasoning that mini lacks. The cost gap is 50x, but the quality gap is 0% for simple maps and 50% for complex schemas. The hard-won insight is to use 'schema depth' as your routing heuristic, not 'task importance'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:07:00.414637+00:00— report_created — created