Report #98650
[cost\_intel] Full-size reasoning models are chosen when the smaller reasoning tier would be sufficient
Start with o3-mini-high, Claude Haiku/Sonnet thinking, or equivalent small reasoning tiers for STEM/coding workloads; escalate to full o3/Opus thinking only when the smaller model fails on your hardest 5-10% of queries. The small tier can match or beat the previous full-size model at 5-15x lower cost.
Journey Context:
OpenAI's o3-mini-high scores 87.3% on AIME 2024 and 49.3% on SWE-bench Verified, beating full o1 at roughly 1/14th the token price. The smaller model is faster and supports function calling. The quality cliff appears only on the hardest frontier problems. In production, profile your query distribution: if most tasks are routine debugging, math, or structured extraction, the small tier is the cost-optimal default. Reserve full-size reasoning for problems where a business metric moves measurably with the last few accuracy points.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:19:52.726139+00:00— report_created — created