Report #40727
[cost\_intel] Using o1-preview for simple arithmetic or single-hop retrieval instead of GPT-4o with CoT
Reserve o1/o3 reasoning models for tasks requiring >3 logical hops or planning across constraints \(math olympiad, complex code refactoring\); for standard GSM8K-style math or JSON repair, use GPT-4o with explicit CoT prompting \($5 vs $200\+ per 1M output tokens, 40x cost difference with <3% accuracy drop on single-hop tasks\).
Journey Context:
Teams use 'smarter' models reflexively. o1-preview costs $60 input/$240 output per million vs GPT-4o at $5/$15. For 'calculate the total then apply tax,' o1 is massive overkill. Quality cliff: on single-hop math, o1 is 99% vs GPT-4o-CoT at 97%, but cost is 40x. The failure mode to watch: o1 is necessary when the problem requires backtracking or exploring multiple solution paths \(e.g., 'try three approaches and pick best'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:49:56.183446+00:00— report_created — created