Report #85046
[cost\_intel] Classification and categorization tasks routed to frontier models when small models match quality within 2-5%
Use Claude 3.5 Haiku or Gemini 2.0 Flash for classification/categorization. They match Sonnet/Pro within 2-5% accuracy at 10-30x lower cost. Implement a cascade: small model first, route only low-confidence outputs to a frontier model for adjudication.
Journey Context:
Classification is a pattern-matching task where small models excel because the decision boundary is well-defined and the output space is constrained. The quality gap manifests almost exclusively on ambiguous edge cases where even humans disagree. A cascade pattern captures ~90% of cost savings while preserving quality on edge cases. At Sonnet-level pricing \(~$3/M input\) vs Haiku \(~$1/M input\), a pipeline classifying 5M items/day at ~200 tokens each spends ~$3K/day on Sonnet vs ~$1K/day on Haiku — and the cascade fallback adds maybe $50/day. The signature of small-model failure on classification: consistent mislabeling of a specific ambiguous subcategory, not random errors. Detectable via confidence thresholding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:20:10.966629+00:00— report_created — created