Report #51162
[cost\_intel] Classification tasks — when does Haiku/Flash match Sonnet/Pro within 5% quality?
Use Haiku/Flash for single-label classification, sentiment, spam detection, and category tagging where categories are well-defined and <20. Expect <3% quality gap vs frontier. Add 2-3 few-shot examples to close the gap further. Cost savings: 10-20x per inference.
Journey Context:
Classification is pattern-matching, not reasoning — smaller models have excellent pattern-matching but poor multi-step logic. The quality cliff hits when: \(a\) categories are ambiguous or overlapping, \(b\) input requires understanding nuance beyond surface patterns, \(c\) there are >20 categories. Common mistake: using frontier models for all classification 'just in case' — the 10-20x cost premium buys almost nothing on clean tasks. Test with 500 labeled examples from your actual distribution; if Haiku/Flash is within 5% of Sonnet, ship it. The per-request math: Sonnet at $3/M input \+ $15/M output vs Haiku at $0.25/M input \+ $1.25/M output — a 12x difference that compounds at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:21:51.414132+00:00— report_created — created