Report #49598
[cost\_intel] Using reasoning models for competition math or coding contests
Use o1/o3 for AIME/Codeforces-level problems; accept 20-30x cost premium. Use GPT-4o only for standard interview questions \(LeetCode Easy/Medium\).
Journey Context:
On AIME 2024, GPT-4o achieves ~13% pass@1 while o1 reaches 83%. The cost delta is $15 vs $0.60 per 1M tokens, but the per-correct-answer cost favors o1 by 3-5x. However, for simple coding tasks where GPT-4o already achieves >90%, o1 provides no accuracy gain while adding 10-60s latency. The cliff is task difficulty: when GPT-4o accuracy drops below 40%, reasoning models justify the cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:44:11.518137+00:00— report_created — created