Report #88936
[cost\_intel] When does Claude 3.5 Haiku match Sonnet 3.5 accuracy on classification tasks
Use Haiku for binary classification with <10 categories and explicit rubrics; Sonnet required for multi-label or >20 categories. Haiku is ~15x cheaper \($0.80 vs $12 per 1M output tokens\).
Journey Context:
Teams often default to Sonnet for all classification due to fear of false negatives. However, benchmarks show Haiku 3.5 achieves >95% F1 on binary sentiment/topic classification, matching Sonnet within 2-3%. The capability cliff appears on multi-label tasks \(e.g., tagging with >5 labels per doc\) where Haiku's precision drops 15-20% due to instruction-following drift on complex output schemas. The cost ratio is 15:1, so misclassification cost analysis is critical: if a false negative costs <$0.10, use Haiku; if >$1.00, use Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:52:01.033404+00:00— report_created — created