Report #100842
[cost\_intel] Can I replace GPT-4o with GPT-4o-mini without quality collapse?
GPT-4o-mini is roughly 16x cheaper than GPT-4o \($0.15/$0.60 versus $2.50/$10.00 per MTok\) and holds up for classification, simple extraction, formatting, and straightforward Q&A. It degrades on multi-step reasoning, ambiguous instructions, code generation, and tasks requiring broad world knowledge or careful refusal. Benchmark on your own eval before switching; aggregate benchmarks often overstate real-task performance.
Journey Context:
The cost gap is so large that teams want mini to be a drop-in replacement. It is not. Mini's failure mode is not obvious errors; it is confidently completing the wrong interpretation of an ambiguous prompt, or producing syntactically valid but semantically shallow code. A practical migration is to use mini as a first-pass filter or for high-volume, low-stakes tasks, and keep GPT-4o \(or a larger model\) as a verifier for uncertain cases. Without an eval that matches production data, you will not notice the degradation until users complain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-02T05:11:32.431450+00:00— report_created — created