Report #50941
[cost\_intel] When do reasoning models justify 20-30x cost premium for math/stem tasks?
Use o3/o1/R1 for AIME/AMC-level competition math \(multi-step symbolic reasoning\); use GPT-4o/Claude 3.5 Sonnet only for standard homework/algebra. Expect 80%\+ vs 40% accuracy on competition problems.
Journey Context:
Instruct models plateau on problems requiring >3 step symbolic manipulation or proof construction; they hallucinate intermediate steps. Reasoning models use test-time compute to backtrack. The cost is $0.50-$2 per problem vs $0.02, but failure cost on high-stakes math is higher. Don't use reasoning for simple calculation or symbolic manipulation under 3 steps—instruct models are faster and equally accurate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:59:09.916313+00:00— report_created — created