Report #97115
[cost\_intel] Using o1 for boilerplate code generation but GPT-4o for debugging
Use GPT-4o for boilerplate generation \(faster, cheaper\), reserve o1 for bug localization and complex algorithmic fixes where it beats 4o by 40%\+ on SWE-bench Verified.
Journey Context:
SWE-bench Verified results show o1-preview achieves 40-50% solve rate vs GPT-4o's 15-20% on real GitHub issues. The gain comes from tracing execution paths and root cause analysis. However, for 'Write a React component' tasks, o1 is overkill—it takes longer and costs 20x for stylistically similar output. Use o1 when the task says 'fix this bug' or 'optimize this algorithm', not 'scaffold this CRUD app'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:35:28.134608+00:00— report_created — created