Report #69796
[cost\_intel] Using o1-pro for math proof generation is 50x cost for marginal gain
Use o1-pro only for proof verification/critique; generate proofs with GPT-4o or Claude 3.5 Sonnet, then verify with o1-pro. Budget 10x cost for verification stage only.
Journey Context:
o1-pro costs $200/1M tokens vs $4/1M for GPT-4o \(50x\), but only improves proof generation by ~15% on formal math benchmarks. However, for proof verification \(finding bugs\), o1-pro shows 300% improvement over 4o—catching subtle logical gaps. Teams incorrectly assume generation and verification have same cost-benefit curves. Verification is 'easier' for reasoning models \(P vs NP intuition\), so allocate budget there.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:38:08.726787+00:00— report_created — created