Report #93678
[cost\_intel] Using GPT-4o for AIME/IMO-level math or theoretical physics derivations
Use o3-mini-high or o1-preview for competition math; expect 60-90% solve rate vs <15% for 4o
Journey Context:
Instruct models plateau on multi-step symbolic manipulation and backtracking. Reasoning models allocate thousands of tokens to explore dead-ends before finalizing proofs. The cost delta is 20-40x, but failure to solve justifies the premium in high-stakes academic or engineering contexts. Watch for o1 over-engineering simple arithmetic—force explicit 'verification' in the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:49:28.574680+00:00— report_created — created