Report #97120
[cost\_intel] Defaulting to o1-preview for all reasoning tasks without considering the reasoning tax spectrum
Use o1-mini for math/coding reasoning where it beats GPT-4o and is 80% cheaper than o1-preview; use o1-preview only for PhD-level science/ambiguous reasoning; use GPT-4o for everything else.
Journey Context:
o1-mini is trained similarly to o1 but with a smaller base model, making it 3-4x faster and much cheaper. It matches o1-preview on competitive math \(AIME\) and often beats GPT-4o on code. The failure mode is 'knowledge-heavy' tasks requiring world knowledge outside the reasoning chain—here o1-mini hallucinates more than o1-preview. The decision tree: Is it math/code with clear verification? -> o1-mini. Is it fuzzy reasoning with edge cases? -> o1-preview. Is it pattern matching? -> GPT-4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:35:55.598045+00:00— report_created — created