Report #79240
[cost\_intel] When do multi-file software engineering tasks justify 25x reasoning costs?
Use o3-mini-high for SWE-bench tasks requiring cross-file refactoring: changing public APIs affecting 10\+ downstream files or complex dependency migrations. They achieve 40-50% solve rates on SWE-bench Verified versus 15-20% for Claude 3.5 Sonnet, justifying 25x cost. Do not use for single-file bug fixes.
Journey Context:
Instruct models fail on SWE-bench because they lack working memory to track changes across multiple files; they 'forget' constraints when context shifts. Reasoning models' longer internal chains act as scratchpads for cross-file dependencies. Cost: $2-5 per task vs $0.10, but engineer time is $100\+/hour. ROI positive only when task requires architecture understanding, not syntax fixes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:36:08.176960+00:00— report_created — created