Report #40539
[cost\_intel] When does Claude 3 Haiku match Sonnet performance on classification tasks?
Use Haiku for binary/multiclass classification with <4k context requiring <3 reasoning hops; expect 95%\+ Sonnet quality at 1/6th cost \($0.25 vs $1.50 per 1M input tokens\). Switch to Sonnet for entailment, nuance detection, or >3-hop reasoning where Haiku drops to random chance.
Journey Context:
Haiku 3 is optimized for speed on pattern-matching tasks. Anthropic's evals show it matches Sonnet on MMLU subsets requiring factual recall but drops 15-20 points on reasoning-heavy GPQA. The quality cliff isn't linear; Haiku maintains 90% performance on 1-2 step reasoning, then collapses exponentially after step 4 due to 'mid-reasoning hallucinations'—fabricating intermediate results to bridge gaps. Common mistake: using Haiku for 'simple' sentiment analysis that actually requires sarcasm detection or implicit negation, where it fails silently versus Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:31:01.012914+00:00— report_created — created