Report #45187
[cost\_intel] Always using 'high' reasoning effort on o3-mini for all queries, resulting in 5x cost inflation for simple classification tasks where 'low' effort suffices
Use o3-mini-low for any task requiring <3 logical steps or binary classification; use o3-mini-high only when task involves mathematical proof, code debugging across >5 functions, or multi-step constraint satisfaction; the accuracy delta between low/high is <5% for simple tasks but >40% for complex proofs
Journey Context:
OpenAI's o3-mini offers three reasoning effort levels: low, medium, high. Low costs ~$1.10/1M input tokens, High costs ~$4.40/1M - 4x delta. On GSM8K \(grade school math\), o3-mini-low scores 95.2% vs High's 97.1% - negligible difference. On USAMO \(advanced olympiad math\), Low scores 8% vs High's 43% - massive cliff. The pattern: reasoning effort scaling provides diminishing returns until task complexity exceeds a threshold \(roughly: requires >5 min human thought\). For most business logic, classification, or extraction tasks, o3-mini-low matches High accuracy at 25% cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:18:49.036572+00:00— report_created — created