Report #70712
[cost\_intel] Using o3-mini-high for all coding tasks assuming linear quality scaling
Use o3-mini-low for debugging syntax errors and standard library queries \(95% solve rate\); switch to high only for algorithmic problems requiring >3-step search \(GPQA diamond level\)
Journey Context:
o3-mini exposes reasoning effort as a dial. Low effort is ~1/10th the cost of high. The quality curve is S-shaped: low effort matches GPT-4o on most coding tasks; high effort only pulls ahead on 'Competition Math' \(AIME >80th percentile\) or GPQA Diamond. The signature for high effort: problems where the solution requires exploring a tree of possibilities \(backtracking\) rather than pattern matching. Common error: assuming 'mini' means 'cheap enough to use everywhere' without dialing down effort.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:16:16.425999+00:00— report_created — created