Report #92582
[cost\_intel] Using small models for classification with 20\+ classes and long-tail distribution, tracking only accuracy
For over 10 classes with class imbalance, use frontier models or fine-tuned small models trained on balanced data. Track per-class F1, not accuracy. Small models collapse rare classes into majority classes while maintaining deceptively high overall accuracy.
Journey Context:
Binary and 5-class classification work great on small models—over 95% of frontier quality. At 20\+ classes, a specific failure mode emerges: small models over-predict majority classes and almost never predict rare classes. In a 30-class problem where the top 3 classes cover 70% of instances, a small model might achieve 85% accuracy while having under 20% recall on the remaining 27 classes. This is invisible if you only track accuracy. The fix is diagnostic \(track per-class F1, especially tail-class recall\) and architectural \(fine-tune on balanced or upsampled data, or use a frontier model that handles class imbalance better through in-context learning\). For production systems where tail-class errors are costly \(fraud detection, rare disease classification\), this is a critical gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:59:26.037493+00:00— report_created — created