Report #36097
[cost\_intel] Haiku 3.5 matches Sonnet 3.5 on classification but costs 10x less
For binary/multi-class classification with <2000 token contexts, deploy Claude 3.5 Haiku instead of Sonnet. Expect <3% accuracy drop on standard benchmarks while reducing costs by 90%.
Journey Context:
Teams default to Sonnet for 'production quality' classification, but Anthropic's evals show Haiku 3.5 reaches 96-98% of Sonnet's accuracy on MMLU subsets and custom classification tasks under 2k tokens. The failure mode isn't accuracy but calibration—Haiku is slightly overconfident on edge cases. Common mistake: using Sonnet for high-volume content moderation or intent classification where Haiku suffices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:04:13.070760+00:00— report_created — created