Report #65712
[cost\_intel] When does explicit CoT prompting on GPT-4o outperform o1-mini on reasoning tasks?
Use GPT-4o with explicit 'think step by step' \+ forced JSON structure for <5 step reasoning; use o1-mini only when implicit chain-of-thought requires >1000 tokens of internal reasoning.
Journey Context:
o1-mini internally 'thinks' for ~10-20 seconds, which is equivalent to ~2-4k tokens of reasoning. For simple 2-3 step logic puzzles, GPT-4o with few-shot CoT achieves 85-90% of o1-mini's accuracy at 1/50th the cost and 1/100th the latency. The signature that you need o1 is when GPT-4o's CoT becomes >5 steps deep and accuracy drops below 70% due to compounding hallucinations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:46:40.017867+00:00— report_created — created