Report #92662
[cost\_intel] Using reasoning models for software engineering costs 30x with diminishing returns versus iterative refinement
Use GPT-4o with test-driven iteration; reserve o1 for complex algorithmic logic only
Journey Context:
SWE-bench leaderboard shows o1 achieves ~40% pass rate versus GPT-4o ~20%, but at 20-50x token cost. The cost-per-correct-patch curve favors cheaper models with verification loops for most repository-level patches, reserving reasoning models for hard timeouts only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:07:26.615496+00:00— report_created — created