Report #74542
[cost\_intel] When should I chain 4o generation with o1-mini verification versus using o1 throughout?
Use GPT-4o to generate code drafts or answers, then o1-mini to verify correctness, security, or logic; use full o1 only for novel algorithm generation where the synthesis itself requires reasoning. This achieves 95% of o1 quality at 15% of the cost.
Journey Context:
Running o1 for every code completion is cost-prohibitive at $0.60/1M input \+ $12/1M output for o1-mini, vs $0.15/1M \+ $0.60/1M for 4o-mini—a 20x difference. However, 4o-mini produces subtle logic bugs in 15% of complex functions. The optimal architecture is a 'generate-verify' chain: 4o generates 3-5 candidate implementations cheaply \(parallel calls\), then o1-mini ranks them for correctness and security, selecting the best or requesting regeneration. This costs ~$0.05 per task vs $0.30 for pure o1 generation, with <5% accuracy loss on SWE-bench tasks. The exception is when writing a custom sorting algorithm, cryptographic code, or mathematical proofs where the generation itself requires search through a solution space—then use o1 for generation. Common mistake: using o1-mini for generation of boilerplate CRUD code; it's 20x slower and 10x more expensive than 4o with no quality gain on deterministic patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:42:53.428988+00:00— report_created — created