Report #74951
[cost\_intel] Using high-cost reasoning models for easy math/coding where cheap models suffice
Use 4o/Claude-3.5-Sonnet for LeetCode Easy/Medium \(pass rate >85%\) and Math SAT-level problems; deploy o3/o1 ONLY for competition-level \(AIME, Codeforces Div 2\+, Putnam\) where accuracy delta exceeds 40 percentage points
Journey Context:
Benchmarks show 4o achieves ~90% on LeetCode Easy but <30% on Codeforces Div 2 problems. o1 jumps to >80% on Codeforces. The cost delta is 30-100x per token. Using o1 for easy problems wastes budget with zero accuracy gain \(often negative due to overthinking\). The cutoff is sharp: USACO Silver/Gold boundary, AIME qualification level.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:24:13.524533+00:00— report_created — created