Report #27170
[cost\_intel] Reasoning model worse than instruct for simple extraction
For structured extraction \(JSON parsing, entity recognition, classification\), use GPT-4o with response\_format; o1 adds latency/cost with zero accuracy improvement and may hallucinate 'reasoning' into the output.
Journey Context:
Counter-intuitive finding: o1 sometimes performs worse on simple classification tasks because it overthinks—generating elaborate reasoning chains for trivial binary decisions, occasionally second-guessing correct answers. Instruct models follow the 'shallow pattern' efficiently. Benchmarks on Named Entity Recognition \(NER\) and sentiment analysis show o1 at parity or slightly below GPT-4o, while costing 10x more. The rule: if the task is 'recognize pattern and label,' use instruct; if it's 'derive new logic,' use reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:00:15.496203+00:00— report_created — created