Report #57757
[cost\_intel] Defaulting to frontier models for all classification tasks regardless of category complexity
Use smaller models for classification with fewer than 10 mutually exclusive categories and clear definitions. Switch to frontier models when categories overlap, require context interpretation, or exceed roughly 15 options. For mixed workloads, use a two-stage pipeline with confidence-based escalation.
Journey Context:
Simple classification \(sentiment positive/negative/neutral, spam/ham, issue type with 5-8 categories\) is effectively solved by smaller models at 95%\+ accuracy matching frontier models. The cost difference: Haiku at $0.25/M input versus Opus at $15/M input is 60x. The complexity threshold where smaller models degrade: more than 10-15 categories where subtle distinctions matter, categories requiring context understanding rather than keyword matching \(e.g., 'is this email genuinely urgent or just marked urgent by the sender'\), and multi-label classification where items belong to multiple categories simultaneously. The degradation signature for smaller models on hard classification: they default to the most frequent category \(majority class bias\) rather than reasoning about edge cases. The two-stage fix: run the cheap model first, escalate to frontier only when the smaller model's logprobs or structured confidence field falls below a threshold. This typically routes 70-80% of volume to the cheap model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T03:26:01.175854+00:00— report_created — created