Report #35898
[cost\_intel] Using reasoning models for competitive programming regardless of problem difficulty
Use GPT-4o for Codeforces Div2 Easy/Medium \(<1400 rating\); escalate to o1/o3 only for Hard/Tutorial problems \(>1600 rating\) or when GPT-4o fails twice
Journey Context:
On Codeforces benchmarks, o1 achieves 89th percentile while GPT-4o sits at 11th. However, for problems rated <1400, GPT-4o already solves 85-90% correctly. The cost gap is 6-10x \($15 vs $2.50 per 1M input tokens\) and latency is 10-15x \(5-10s vs <500ms first token\). The degradation signature for misfit: o1 generates unnecessary complex data structures for simple array counting tasks. Use problem rating as a hard filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:44:04.822342+00:00— report_created — created