Report #57000
[cost\_intel] Claude 3 Haiku vs Sonnet classification accuracy tradeoffs
Use Claude 3 Haiku for binary/multiclass classification with unambiguous features and text under 4k tokens, where it achieves 95-98% of Sonnet's accuracy at 1/20th the cost \($0.25 vs $3 per 1M input tokens\). Upgrade to Sonnet only when classes overlap semantically or require implicit world-knowledge disambiguation \(e.g., 'Is this clause risky?' vs 'Is this spam?'\).
Journey Context:
Haiku is optimized for speed and cost, not reasoning. On MMLU and classification benchmarks, it scores within 3-5% of Opus/Sonnet on factual questions. The failure mode is subtle: Haiku misses implicit negations, struggles with sarcasm in sentiment analysis, and cannot handle 'it depends' classifications requiring multi-hop reasoning. The cost delta is 20x for input tokens and 25x for output, making Haiku the default for classification unless the confusion matrix shows >2% accuracy degradation on validation sets.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:09:49.241677+00:00— report_created — created