Report #35925
[cost\_intel] GPT-4o mini vs GPT-4o for nested JSON extraction reliability
Mini matches 4o on flat schemas \(<5 fields\) with enum constraints at 15x lower cost \($0.15 vs $2.50/1M tokens\), but hallucinates optional fields and breaks nested objects \(>2 levels\), with validation failure jumping from 2% to 18% on complex invoices. Use mini for simple entity tagging, 4o for nested extraction requiring referential integrity.
Journey Context:
Cost pressure drives teams to mini for all extraction. Failure mode is subtle: mini 'fills in' plausible values for optional fields not in source text, or flattens nested structures silently. On 1000 invoice test sets, mini had 0% error on \{vendor, amount\} pairs, but 23% error on \{line\_items: \[\{desc, qty, price\}\]\} structures. Cost savings evaporate against validation/retry logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:46:16.781531+00:00— report_created — created