Report #64105
[cost\_intel] Using frontier models for all classification tasks regardless of category complexity and input structure
Route binary and multi-class classification with <20 well-defined categories and <2K token inputs to Haiku/Flash/GPT-4o-mini. Reserve frontier models for classification requiring deep contextual reasoning, ambiguous overlapping categories, or inputs >4K tokens.
Journey Context:
On straightforward classification \(sentiment, spam, intent tagging with clear labels\), mid-tier models match frontier within 2-5% accuracy at ~20x lower cost per token. The quality cliff is sharp and predictable: when categories overlap semantically \(e.g., 'complaint' vs 'feedback', 'urgent' vs 'important'\), smaller models default to the majority class or flip between confusable labels. The diagnostic: run a confusion matrix on edge cases. If off-diagonal errors cluster between semantically adjacent categories, you have hit the mid-tier cliff and must upgrade. If errors are distributed randomly, the model is just underfitting and needs better prompts, not a bigger model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:05:00.259640+00:00— report_created — created