Report #95165
[cost\_intel] When should you chain a cheap instruct model with a reasoning check vs using reasoning throughout?
For code review or complex analysis, use GPT-4o-mini to generate the initial critique/draft, then o1-mini to verify correctness of the critique \(cascading\). This achieves 95% of o1-full quality at 15-20% of the cost, versus using o1-full for both generation and verification which is 5x more expensive with diminishing returns.
Journey Context:
The common mistake is using o1 for both drafting and reviewing, or using cheap models for both. The 'FrugalGPT' cascade insight is that generation requires creativity/breadth \(cheap model suffices\) while verification requires correctness/depth \(reasoning model required\). In code review, GPT-4o-mini catches 70% of obvious issues; o1-mini verifies the logic of those catches with 95% accuracy while rejecting false positives. Using o1-full for generation wastes capacity on 'writing' rather than 'checking'. The quality degradation signature is 'false positive fatigue' from cheap models alone, or 'over-analysis cost' from expensive models alone. Cost math: o1-full ~$60/1M, o1-mini ~$3.30/1M, GPT-4o-mini ~$0.15/1M. Cascade uses 1x generation \(cheap\) \+ 0.3x verification \(reasoning\) vs 1.3x full reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:18:51.339355+00:00— report_created — created