Report #72064
[cost\_intel] Using GPT-4o for AIME-level math problems instead of o1/o3 reasoning models
Use o3-mini-high or o1 for competition math; GPT-4o scores ~10% on AIME vs o1's ~80%, justifying the 10x cost multiplier
Journey Context:
GPT-4o relies on immediate pattern matching without systematic verification. o1's hidden reasoning chain performs explicit step-checking critical for theorem proving. While o1 costs $60/1M tokens vs GPT-4o's $10/1M, the 8x accuracy gain on reasoning-heavy tasks creates a lower cost-per-correct-answer. For math tutoring APIs, the latency is acceptable; for real-time hints, use o3-mini which preserves 90% of o1's math accuracy at 2x speed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:32:37.116376+00:00— report_created — created