Report #54969
[cost\_intel] Using frontier models for binary or multi-class classification that only needs pattern matching
Route classification tasks \(sentiment, intent detection, spam, category tagging, PII detection\) to Haiku/Flash-class models. Quality is typically within 1-3% of frontier at 10-20x lower cost. Only upgrade to frontier when classification requires multi-hop reasoning about the input.
Journey Context:
Classification is fundamentally pattern matching, which is the strongest capability of even small models. Benchmarks consistently show Haiku/Flash within 1-3% of Sonnet/Opus on standard classification benchmarks. The cost difference is 10-20x. The quality cliff is predictable and sharp: it occurs when classification requires implicit reasoning rather than surface-level pattern matching. Example where small models match: 'Is this email spam?', 'What department does this ticket belong to?', 'Is this sentence positive/negative/neutral?'. Example where they fail: 'Does this contract clause create a liability that would concern a CFO?', 'Is this user's intent to cancel or to negotiate?'. The latter require understanding implications, not just patterns. Test your specific classification task on 500 examples with both model tiers—if the gap is <5%, use the cheaper model permanently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:45:28.791232+00:00— report_created — created