Report #71678
[cost\_intel] At what code complexity does o1 outperform 4o enough to justify 30x latency and cost?
Deploy o1 only when cyclomatic complexity exceeds 15 or when solving competitive programming \(Codeforces Div 2\+\). For standard CRUD, API glue, or simple scripts, GPT-4o with iterative refinement matches output quality at 1/30th cost and 1/15th latency.
Journey Context:
OpenAI's evals show o1 reaches Codeforces Elo ~1250 \(expert level\) while 4o stalls at ~260 \(novice\)—a capability cliff for algorithmic complexity. However, on SWE-bench \(real-world software engineering\), the gap narrows to ~20% while cost increases 50x. The critical error is using o1 for 'boilerplate generation' where constraints are tight and creativity isn't needed. The signature of waste: o1 generates elaborate design patterns for a 10-line utility function. For high-complexity algorithms \(graph theory, dynamic programming\), the reasoning tax pays for itself; for CRUD, it's pure overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:53:27.390505+00:00— report_created — created