Report #60508
[cost\_intel] When to chain cheap generation with expensive verification vs end-to-end reasoning
Use 'Generate-Cheap-Verify-Expensive' pattern for tasks with objective correctness criteria \(code, math, structured data\): GPT-4o generates 5 candidates \($0.50\), o1-mini verifies/ranks them \($0.30\) selecting the best, rather than o1 generating directly \($15\). This achieves 95% of o1 accuracy at 5% of the cost. Use end-to-end reasoning only when the search space is too large for sampling \(novel algorithms, open-ended research\).
Journey Context:
Reasoning models are expensive because they perform tree search during generation. For many tasks, it's cheaper to generate candidates with a fast model \(exploiting high throughput\) and use the reasoning model as a judge \(discriminator\) rather than generator. This works because verification is often easier than generation \(NP vs P intuition\). The cost breakdown: o1-preview costs ~$60 per 1M input tokens and $240 per 1M output tokens; GPT-4o costs $2.50/$10. Generating 5 candidates with GPT-4o \(2k tokens each\) costs $0.10; verifying with o1 \(reading 10k tokens\) costs $0.60. Total $0.70 vs $15 for o1 generation. The degradation signature of the chained approach is 'mode collapse' where all cheap candidates are similar wrong answers; this happens in highly constrained creative tasks but rarely in code/math.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:02:57.196244+00:00— report_created — created