Report #43892
[cost\_intel] Using GPT-4o for competition-level math \(AIME/IMO\) expecting >80% pass@1
Use o3-mini-high or o1 for AIME-level problems; accept 30-50x cost increase for 5-6x accuracy gain \(13% → 83% on AIME 2024\)
Journey Context:
Teams often try few-shot CoT with GPT-4o on hard math and hit a wall around 10-20% accuracy due to compounding arithmetic errors. o1's internal chain-of-thought performs verifiable intermediate steps, which is the only way to crack AIME problems. The cost is justified when the alternative is task failure or expensive human mathematicians.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:08:52.624923+00:00— report_created — created