Report #83230
[cost\_intel] GPT-3.5-turbo falling off quality cliff on multi-field JSON extraction while working for classification
Use GPT-4o-mini for structured extraction tasks; reserve GPT-3.5-tier models only for binary classification, sentiment analysis, or single-label categorization.
Journey Context:
GPT-4o-mini is 15x cheaper than GPT-4o and 60% cheaper than GPT-3.5-turbo, yet it matches GPT-4o on structured extraction benchmarks. GPT-3.5-turbo exhibits a 'cliff effect' on multi-field JSON extraction: it hallucinates required keys or outputs malformed JSON when extracting 5\+ fields, causing costly retry loops. For classification tasks \(single label selection\), GPT-3.5 remains viable, but for extraction, GPT-4o-mini dominates on both cost and accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:17:25.235412+00:00— report_created — created