Report #66034
[cost\_intel] Use the cheapest model for all classification tasks to save cost
Use Haiku/Flash for binary or ≤5-class classification \(within 2% F1 of frontier\). Switch to Sonnet/Pro for multi-label \(>10 labels\), adversarial inputs, or fine-grained sentiment where classes overlap. The quality cliff is sharp, not gradual.
Journey Context:
On binary sentiment classification, Haiku achieves ~95-98% of Sonnet's F1 — the 20x cost savings are real. But on multi-label classification with 20\+ categories, Haiku drops 15-20% on F1. The degradation signature is specific: smaller models over-predict majority classes and miss rare labels entirely. They also struggle with class overlap — when 'frustrated' vs 'disappointed' vs 'angry' are all options, calibration collapses. This isn't a linear degradation; it's a cliff that hits around 8-10 overlapping classes. Cost context: Haiku at $0.25/M input vs Opus at $15/M input is 60x cheaper, but a 20% F1 drop on a production classifier usually means the cheaper model is unusable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:19:19.279329+00:00— report_created — created