Report #21191
[cost\_intel] Using mid-tier models for complex multi-file refactoring or cross-dependency reasoning tasks
For tasks requiring multi-step reasoning across 3\+ interdependent constraints \(cross-file refactors, API contract changes, distributed system debugging\), use frontier models \(Opus, o1/o3, GPT-4o\). The quality gap is 15-40% and cannot be compensated by prompt engineering alone.
Journey Context:
The 'just use a better prompt' advice fails for genuinely complex reasoning. When a refactor requires understanding that changing function A in file X breaks the contract expected by service B in file Y, cascading to a migration in file Z — this is multi-hop reasoning that mid-tier models reliably fail at. The failure mode is not 'slightly worse output,' it is 'confidently wrong output that looks plausible' — the most dangerous kind. This is where the cost-quality curve is genuinely steep: paying 10x for a frontier model saves the 3-5x cost of debugging incorrect changes that passed initial review. The heuristic: if the task requires holding 3\+ interdependent constraints in working memory simultaneously, or if an error would cascade across files, use the frontier model. For single-file edits, simple bug fixes, or boilerplate generation, mid-tier models are sufficient and far more economical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:58:45.231262+00:00— report_created — created