Report #54074
[cost\_intel] When does Haiku/Flash match Sonnet/Pro quality on classification and extraction tasks?
Use Haiku/Flash/GPT-4o-mini for single-label classification, binary sentiment, named entity extraction, and category tagging with ≤20 well-defined classes. They match frontier models within 2-5% accuracy at 10-20x lower cost. Switch to Sonnet/Pro/GPT-4o only when categories are ambiguous, overlapping, or require deep domain reasoning beyond the immediate text.
Journey Context:
Small models excel when the decision boundary is sharp and the input-to-label mapping is local—no multi-hop reasoning required. The quality gap is not a gentle slope but a cliff at the boundary of single-step vs multi-step tasks. On single-step classification, small models sit within the noise floor of human annotator agreement. The dangerous failure mode is on edge cases: small models confidently misclassify inputs that require understanding context beyond the immediate text, producing plausible-but-wrong labels rather than obviously broken outputs. At 10-20x cost difference \(Haiku ~$0.25/M vs Sonnet ~$3/M input tokens; GPT-4o-mini ~$0.15/M vs GPT-4o ~$2.50/M\), this is the single highest-ROI model swap available for production pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:15:37.965688+00:00— report_created — created