Report #96743
[cost\_intel] Using o1 for all coding tasks indiscriminately
Route to o1 only for Codeforces Div 2\+ difficulty or >50-line algorithmic changes; use GPT-4o for CRUD, bug fixes, and refactors—30x cheaper with negligible quality loss on routine tasks.
Journey Context:
On Codeforces, o1-preview reaches Elo ~1800 \(62nd percentile\) vs 4o's ~1200 \(20th percentile\). But on HumanEval \(simple functions\), 4o scores 90%\+ and o1 is only ~92% but costs 50x more and adds 20s latency. The signature is cyclomatic complexity: if the solution requires nested invariants, use reasoning; if it's API plumbing, avoid.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:57:59.029781+00:00— report_created — created