Report #94126
[cost\_intel] When do reasoning models justify 10x cost for mathematical tasks?
Deploy o1/o3-class models for competition-level math \(AIME/AMC/Olympiad\) where they achieve >80% accuracy vs <20% for instruct models; use instruct models for standard algebra/calculus homework.
Journey Context:
The cost-per-correct-answer inverts for high-complexity math. At $15/1M tokens \(reasoning\) vs $2.50/1M \(instruct\), a task with 80% success vs 10% success yields $18.75 vs $25 per correct answer. However, for simple tasks where both exceed 90% accuracy, the reasoning premium is pure waste. The critical threshold is 'multi-constraint satisfaction with >3 logical hops'—when instruct models drop below 40% accuracy, switch to reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:34:43.964826+00:00— report_created — created