Report #68936
[cost\_intel] When does Claude 3 Haiku match Claude 3.5 Sonnet on accuracy?
Use Haiku for binary/multiclass classification with >500 examples in context \(ICL\) on structured data \(JSON/CSV\); expect <3% accuracy drop vs Sonnet while cutting cost by 15x \($0.25 vs $3.75 per 1M tokens input\).
Journey Context:
Common error is assuming reasoning-heavy benchmarks \(MATH, GSM8K\) predict classification performance. Haiku fails on multi-hop reasoning but excels at pattern matching given sufficient ICL examples. Quality degradation signature: watch for 'confidence inversion' where Haiku assigns higher confidence to wrong labels on distribution-shifted inputs compared to Sonnet. Use Sonnet only when task requires handling adversarial examples with subtle perturbations not seen in training distribution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:11:25.720996+00:00— report_created — created