Report #87128
[cost\_intel] When does Claude 3 Haiku match Sonnet for binary classification of support tickets?
Use Haiku for balanced classes \(>100 examples per class\) with <2000 input tokens; expect 3-5% quality drop vs Sonnet on F1, but 15x cost reduction. Sonnet only necessary for long-tail classes \(<20 examples\) or >4000 token inputs.
Journey Context:
People assume Haiku is 'dumb' but for classification with decent context, it's remarkably capable. The failure mode isn't accuracy, it's calibration on edge cases. Anthropic's evals show near-parity on MMLU subsets for reasoning but not for creative writing. The 2000 token threshold is critical because Haiku's context utilization degrades faster than Sonnet's on long contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:49:55.253725+00:00— report_created — created