Report #35152
[cost\_intel] Using reasoning models for simple entity extraction burns 50x cost with no quality gain
Use regex \+ cheap instruct models \(GPT-4o-mini\) for structured extraction; reserve reasoning models only for extraction requiring multi-hop logic \(e.g., "infer the date from context clues when not explicitly stated"\).
Journey Context:
Reasoning models cost $15-60 per 1M tokens \(o1-preview ~$60/1M output tokens\) vs $0.60 for GPT-4o-mini. For simple NER or JSON extraction from semi-structured text, instruct models with few-shot prompting achieve >95% F1, matching reasoning models. The error occurs when teams apply "use the best model" universally. The quality cliff for instruct models appears only when extraction requires causal reasoning \(e.g., "Why did the contract terminate early?"\). Signature of instruct model failure: high hallucination rate on implied attributes. Cost-per-correct-answer curves from SWE-bench style analysis show 10-50x cost inflation for zero quality gain on deterministic extraction tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:28:49.589115+00:00— report_created — created