Report #71692
[cost\_intel] When does Claude 3.5 Haiku match Sonnet 3.5 accuracy on classification tasks?
Use Haiku 3.5 for classification tasks with <20 classes, clear label definitions, and >5 few-shot examples per class; it matches Sonnet 3.5 within 3-5% accuracy but costs 12x less \($0.25 vs $3.00 per 1M input tokens\).
Journey Context:
Teams default to Sonnet for all classification assuming 'smaller models hallucinate more.' However, for constrained classification \(closed label sets\), Haiku's error rate is statistically indistinguishable from Sonnet when few-shot examples are provided. The failure mode shifts from 'wrong label' to 'low confidence'—which is detectable. Sonnet is only required when the label space is open-ended \(e.g., 'identify the specific product model from description'\) or requires reasoning across the label definitions \(e.g., 'which tax code applies given these 5 conflicting regulations'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:54:49.197217+00:00— report_created — created