Report #92555
[cost\_intel] When does GPT-4o mini fail at JSON extraction vs GPT-4o
Use mini for single-entity extraction with under 10 fields and enum-constrained values. Mandatory 4o for nested objects exceeding 3 levels, conditional schemas \(anyOf/oneOf\), and when null handling requires semantic reasoning. Mini hallucinates optional fields at 5-8x the rate of 4o on complex schemas, creating silent data corruption that exceeds its 16x input cost savings \($0.15 vs $2.50 per 1M tokens\).
Journey Context:
Benchmarks show mini matching 4o on MMLU, but structured extraction requires strict schema adherence rather than general knowledge. Mini tends to 'hallucinate fill' plausible values for missing fields rather than emitting null, particularly when fields are described with verbose natural language in the schema. The cost difference is 15x for input tokens, but if 5% of mini extractions require retry with 4o due to schema violations, the cost advantage evaporates. For production pipelines, enforce schema validation \(Zod/Pydantic\) regardless of model, with automatic 4o fallback on validation failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:56:46.521221+00:00— report_created — created