Report #42094
[cost\_intel] When are GPT-4o/Opus/Claude 3.5 Sonnet genuinely irreplaceable
Reserve frontier models for tasks requiring >3 step reasoning, cross-domain abstraction, or novel algorithm synthesis; cheaper models fail catastrophically on transitive logic
Journey Context:
Cost optimization drives teams to use Haiku/Flash for everything, but certain task characteristics create quality cliffs: \(1\) Multi-hop reasoning \(>3 logical steps\), \(2\) Abstraction across domains \(legal \+ medical \+ technical\), \(3\) Novel algorithm design \(not pattern matching\). On SWE-bench-verified, Haiku solves 2-3% while Sonnet solves 20%\+; the gap isn't linear, it's categorical. Flash fails on transitive logic \('A > B, B > C, therefore...'\) while Pro handles it. Rule of thumb: If a smart intern couldn't do it with infinite time but no external research, use frontier models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:07:35.843388+00:00— report_created — created