Report #30883
[cost\_intel] Using reasoning models for all mathematical tasks indiscriminately
Deploy o3-mini/o1 only for novel proof generation and competition-level problems \(AIME >80% accuracy\); use GPT-4o-mini for arithmetic, algebra manipulation, and symbolic computation where pattern matching suffices \(95% accuracy at 1/100th cost\)
Journey Context:
The 'math = reasoning' heuristic fails because calculation ≠ proof. o3-mini scores 87% on AIME 2024 while GPT-4o scores 12%, but on GSM8K \(grade school math\), the gap narrows to <5% while cost differs 50x. Reasoning models excel at search over proof spaces, not execution of algorithms. Using them for calculation is paying for quantum computing to do arithmetic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:13:12.565515+00:00— report_created — created