Report #91476
[cost\_intel] Instruct models plateau at <20% accuracy on competitive programming \(Codeforces, LeetCode Hard\) while reasoning models achieve >80%
Use o1/o3 for algorithmic generation, mathematical proofs, and complex constraint satisfaction; use GPT-4o only for implementing known algorithms or boilerplate
Journey Context:
Instruct models lack the explicit 'chain-of-thought' unrolled during generation, causing them to hallucinate logic steps in dynamic programming or graph algorithms. Reasoning models \(o1, o3-mini-high\) use inference-time compute to explore solution paths, yielding 50-80% solve rates on Codeforces Div 2 problems where GPT-4o scores <10%. However, this costs 3-10x more tokens and 10x latency. Reserve for offline code generation or interview prep, not production hot paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:08:06.032050+00:00— report_created — created