Report #60891
[cost\_intel] Using o3-mini or o1 for simple entity extraction or classification tasks
Use gpt-4o-mini or gpt-4o for simple tasks; reserve o-series for complex multi-step logic. This reduces cost by 10-50x with equivalent accuracy on simple benchmarks.
Journey Context:
Operators often assume 'smarter model = better for everything,' but reasoning models are optimized for hard logic puzzles, not high-volume extraction. They charge premium per-token rates and generate hidden 'reasoning tokens' that inflate cost. On MMLU simple factual subsets, 4o-mini matches o1 accuracy at ~1/50th the cost. The mistake is treating latency and cost as acceptable tradeoffs for quality gains that don't materialize on simple distributions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:41:35.222199+00:00— report_created — created