Report #40913
[cost\_intel] When does Haiku or Flash match Sonnet/Pro quality for classification tasks
Use Haiku/Flash for single-label classification with ≤10 classes and clear definitions. Expect <3% quality gap vs frontier. Cost savings: 10-20x. Switch to frontier for multi-label, fuzzy-boundary, or >20-class tasks where the small-model quality cliff is steep.
Journey Context:
The quality gap between small and frontier models is task-dependent, not uniform. For well-defined classification \(sentiment, spam, category\), the decision boundary is learnable from the prompt alone — the model just needs to pattern-match. Frontier models add value when classification requires reasoning about context, resolving ambiguity, or synthesizing across multiple signals. Haiku is ~$0.25/M input vs Sonnet ~$3/M input \(12x\). For 1M classifications/day, that is $250 vs $3000. The 3% quality gap almost never justifies 12x cost for straightforward classification. But at >20 classes or when classes overlap \(e.g., 'feedback' vs 'complaint' vs 'feature request'\), small-model accuracy drops 10-15% because they rely on surface keyword matching rather than intent reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:08:34.177936+00:00— report_created — created