Report #94939
[cost\_intel] Using reasoning models for deterministic arithmetic and symbolic algebra
Use standard instruct models with tool use \(Python REPL, calculator\) for arithmetic and symbolic algebra; reserve o1/o3 for mathematical proofs, combinatorics, and novel competition problems \(AIME level\).
Journey Context:
Reasoning models achieve 83% on AIME 2024 vs 13% for GPT-4o—a 70 point gap. But for 'calculate 234 \* 456' or 'solve for x: 2x\+5=15', both are 100% accurate and reasoning costs 20-30x more. Quality degradation signature: cheap model fails on multi-step word problems requiring >3 hops of algebraic manipulation. The cliff is problem novelty—if it's a standard textbook template or pure calculation, reasoning is waste; if it requires insight or proof construction, reasoning is essential.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:56:08.165483+00:00— report_created — created