Report #75582
[cost\_intel] Using frontier models for all classification tasks when smaller models would suffice
Use Haiku or Gemini Flash for binary and multi-class classification with clear category definitions. They match Sonnet or Pro within 2-5% accuracy at 10-20x lower cost. Switch to frontier models only when categories are ambiguous, exceed approximately 10 classes, or require deep world knowledge to distinguish.
Journey Context:
The instinct is to default to the best model for quality assurance. But classification is a strength of smaller models because the task space is bounded and the decision boundary is learnable from examples. Benchmarks consistently show Haiku within 3% of Sonnet on sentiment analysis and topic classification. The degradation signature is specific and predictable: smaller models misclassify edge cases where categories overlap such as mixed sentiment or cross-topic articles. If your categories are well-defined and your examples cover edge cases, the 10-20x cost savings are real with negligible quality loss. The trap is assuming all classification is equal — fine-grained intent detection with 20\+ classes does require frontier reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:27:37.959756+00:00— report_created — created