Report #60929
[cost\_intel] Using frontier models for simple classification tasks when Haiku/Flash match within 2-5%
Route binary and multi-class classification with well-defined categories to Haiku or Flash; only escalate to Sonnet/Pro for multi-label classification where items belong to 3\+ overlapping categories
Journey Context:
For sentiment analysis, spam detection, and category tagging with clear labels, Haiku and Flash match Sonnet/Pro within 2-5% accuracy at 10-20x lower cost per token \(Haiku ~$0.25/MTok vs Sonnet ~$3/MTok input\). The degradation signature is not obvious wrong answers — small models silently drop edge cases and return lower confidence on ambiguous inputs. The cliff appears specifically on multi-label classification where categories overlap semantically. At that point, frontier models maintain 85%\+ F1 while small models drop to 60-70%. Test with a 500-sample held-out set: if per-class F1 variance is under 5% between models, stay on the small one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:45:31.727331+00:00— report_created — created