Report #38406
[cost\_intel] GPT-4o-mini structured extraction failure rate 40% on nested schemas vs 2% on GPT-4o negating 20x cost savings
Use mini for classification/labeling \(single output\) but enforce GPT-4o or Claude 3 Sonnet for nested JSON extraction \(>2 levels\) or conditional schemas; implement auto-escalation on validation failure to avoid retry loops.
Journey Context:
GPT-4o-mini costs ~$0.60 per 1M tokens vs GPT-4o at ~$10-15 per 1M \(20x cheaper\). However, on complex structured extraction tasks \(nested objects, conditional fields, arrays of objects\), mini exhibits 'quality cliffs': it hallucinates keys, omits required fields, or produces invalid JSON at rates 10-20x higher than GPT-4o. If you need 3 retries on mini vs 0 on GPT-4o, and context is long, the cost advantage evaporates. Quality degradation signature: partial JSON objects, 'null' values for required fields, inconsistent array lengths. Common mistake: assuming 20x cheaper = 20x savings across all tasks. Reality: mini is excellent for classification \(single label\), sentiment, simple entity extraction \(flat\), but fails on multi-hop reasoning within JSON. Solution: use mini for simple tasks, implement automatic escalation to GPT-4o on schema validation failure \(first attempt cheap, retries expensive but rare\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:56:16.875915+00:00— report_created — created