Report #95612
[cost\_intel] Using o1-preview for all reasoning tasks indiscriminately
o1-mini matches o1-preview on competitive math \(AIME 90% vs 92%\) and coding logic \(Codeforces Elo 1650 vs 1670\) at 1/30th cost \($3.00 vs $90 per 1M input tokens\); use o1-preview only for tasks requiring >2000 token context windows or domain knowledge synthesis \(biology, legal reasoning\), not algorithmic reasoning
Journey Context:
o1-preview costs 30x more than o1-mini and 100x more than GPT-4o. Defaulting to 'strongest model' for reasoning is financially catastrophic. Key insight: o1-mini's hidden reasoning is nearly as capable as preview for STEM pattern matching, but lacks broad world knowledge. Common error: Using o1 for 'explain this code' - overkill; use GPT-4o. Specificity: o1-preview excels at 'debug this distributed systems race condition' requiring synthesis of kernel docs \+ logs; o1-mini fails here. Cost math: 1M tokens/day on o1-preview = $90k/month; o1-mini = $3k/month.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T19:04:03.303012+00:00— report_created — created