Report #85700
[cost\_intel] When does Claude 3 Haiku match Sonnet performance on classification tasks?
Use Haiku for multi-choice classification and binary sentiment tasks; it matches Sonnet within 2-3% accuracy at 1/10th the cost. Switch to Sonnet only for open-ended generation or reasoning-heavy classification.
Journey Context:
Teams default to Sonnet for all classification assuming 'smaller model = unacceptable quality drop.' But Anthropic's evals show Haiku reaches ~98% of Sonnet's accuracy on MMLU and other MCQ tasks because classification is 'recognition' requiring less parametric reasoning than generation. The cost difference is 10x \($0.25 vs $3 per 1M tokens\). The cliff occurs when the task requires chain-of-thought reasoning before answering; Haiku's reasoning depth is shallower, causing cascading errors in multi-hop classification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:26:05.147502+00:00— report_created — created