Report #61663
[cost\_intel] Assuming all mathematical tasks require reasoning models
Use o1/o3 only for competition-level math \(AIME, AMC 12\) and formal proofs; use GPT-4o/Claude 3.5 Sonnet for arithmetic, algebra I/II, and unit conversions
Journey Context:
Reasoning models show 50-90% accuracy on AIME vs 10-20% for instruct models, but on simple math the gap is <2% while cost is 30x higher and latency goes from 0.5s to 30s. The threshold is competition difficulty: if it's not in the top 5% of math competitions, use instruct.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:59:22.529310+00:00— report_created — created