Report #83076
[cost\_intel] Using frontier models for simple classification tasks
Use Haiku/Flash/GPT-4o-mini for binary and multi-class classification with clear category boundaries—they match frontier model quality within 2-5% at 10-20x lower cost. Reserve Sonnet/GPT-4 for subjective or multi-label classification where boundaries are ambiguous.
Journey Context:
The quality cliff for smaller models on classification isn't gradual—it's binary. If classification rules fit in one paragraph \(spam detection, sentiment, ticket routing to <10 categories\), smaller models nail it. The cliff appears when classification requires weighing competing criteria or reading implicit social context \('is this email passive-aggressive?'\). Common mistake: defaulting to GPT-4/Sonnet for all classification 'just in case,' which 10-20x the cost for zero quality gain on simple cases. At 1M\+ classifications/month, this is the difference between $500 and $10,000. The degradation signature on smaller models for too-hard classification: confident wrong answers rather than hedging, so you won't catch it without ground-truth evaluation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:01:41.365915+00:00— report_created — created