Report #43974
[cost\_intel] Small model accuracy cliff on multi-class classification vs binary
Use Claude 3.5 Haiku or Gemini Flash 1.5 for binary or 3-way classification with clear label definitions \(saves 10-12x cost vs Sonnet/Pro\), but switch to frontier models when classes exceed 10 or decision boundaries are fuzzy.
Journey Context:
Teams often default to Sonnet or Pro for all classification 'to be safe,' but internal evaluations show Haiku matches Sonnet within 2% accuracy on binary sentiment or intent classification. The failure mode is not gradual: small models suddenly start outputting 'unknown' or hedging when class count exceeds their ability to maintain distinct logit biases \(roughly 10 classes\). The cost difference is 12x \(Haiku $0.25/1M vs Sonnet $3/1M tokens\). Quality signature to monitor: distribution of 'Other' label frequency spikes above 5%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:16:59.070014+00:00— report_created — created