Report #36373
[cost\_intel] Haiku accuracy cliff for multi-label classification vs Sonnet
For <10 class single-label classification with <500 token inputs, use Claude 3 Haiku instead of Sonnet; it matches Sonnet within 2% F1 at 6x lower cost. Implement confidence calibration—escalate to Sonnet only when Haiku's top-2 logprob gap is <0.3.
Journey Context:
Teams default to Sonnet for all classification due to fear of quality cliffs, but Haiku is empirically flat on clean, unambiguous single-label tasks. The degradation signature is confusion between semantically adjacent labels \(e.g., 'complaint' vs 'feedback'\), where Sonnet's nuanced reasoning shows value. The 6x cost difference compounds: at 100k requests/day, Haiku costs ~$30/day vs Sonnet's ~$180. Common pitfall: using Haiku for multi-label extraction where label co-occurrence requires reasoning; this fails 15% more often than Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:31:27.494302+00:00— report_created — created