Report #56368
[cost\_intel] Assuming reasoning models always outperform on math/coding
Deploy o3/o1 only for competition-level math \(AIME/USACO\) or >100 line code generation; use GPT-4o/Claude 3.5 Sonnet for LeetCode easy/medium and debugging
Journey Context:
Reasoning models show 50%\+ accuracy gains on AIME \(o1: 83% vs GPT-4o: 13%\) but only 3-5% on standard coding interviews. The cost-per-correct-answer for LeetCode easy is $0.02 \(instruct\) vs $0.40 \(reasoning\). Worse, o1 occasionally overcomplicates simple array problems with unnecessary abstraction layers due to over-optimization for competition problems. The cliff: when problem difficulty drops below USACO silver, reasoning effort yields negative ROI.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:06:26.677992+00:00— report_created — created