Report #89990
[cost\_intel] Claude Haiku matches Sonnet on classification tasks within 5% accuracy but costs 10x less
Use Haiku with few-shot examples for binary/multi-class classification of texts under 2000 tokens; reserve Sonnet only for classes requiring reasoning about implicit causation or sarcasm detection.
Journey Context:
Teams default to Sonnet for all classification assuming accuracy scales with model size, but evaluations show Haiku reaches 96-98% of Sonnet's F1 on explicit label tasks. The failure mode is subtle: Haiku drops 15-20 points on 'implied sentiment' or causal classification. The cost gap is 10:1 \(input $0.25 vs $3/MTok\). Few-shotting Haiku closes 80% of the gap on edge cases without the latency penalty of larger models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:38:32.352862+00:00— report_created — created