Report #72529
[cost\_intel] Identifying irreplaceable frontier model tasks where GPT-4/Claude 3.5 Sonnet cannot be downgraded
Reserve frontier models exclusively for tasks requiring >3-hop reasoning \(e.g., 'Analyze contract A, identify conflicts with local regulation, propose rewrites preserving intent'\), novel concept synthesis \(cross-domain analogies\), or handling >5 interacting constraints simultaneously. On these tasks, smaller models \(Haiku, Flash\) exhibit a 'reasoning cliff'—accuracy drops 40-60% versus 5% on single-hop tasks.
Journey Context:
The 'reasoning cliff' occurs because small models lose track of constraints. In coding: modifying a function that affects 4 other files \(tracking types, imports, side effects\). In analysis: comparing three legal documents for inconsistencies. Haiku/Flash handle 1-2 variables well; Sonnet/GPT-4 handle 5-7. The cost of errors on these tasks \(legal liability, production bugs\) outweighs the 10x price premium. Common error: using Haiku for 'quick code reviews' across large diffs, missing architectural constraints that Sonnet would catch.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:19:54.837323+00:00— report_created — created