Report #72314
[cost\_intel] Where does Claude 3.5 Haiku's classification accuracy cliff drop compared to Sonnet despite 12x lower cost?
Avoid Haiku for classification tasks involving ambiguous negation, implicit temporal reasoning, or class imbalance >1:100; route edge cases to Sonnet when false negative costs are asymmetric.
Journey Context:
Haiku costs $0.25/mTok input vs Sonnet's $3.00/mTok \(12x difference\), making it attractive for high-volume classification. Aggregate benchmarks show <5% F1 difference on clean datasets. However, Haiku exhibits systematic failure modes on 'edge case triangles': \(1\) Ambiguous negation \('not unhappy'\) where it loses 18-25% precision vs Sonnet; \(2\) Temporal reasoning requiring implicit deduction \('before Q2' vs 'after Q1 end'\); \(3\) High-imbalance fraud detection where minority class recall drops 20% relative to Sonnet. The cost-quality curve is deceptive: it's flat and high for 95% of 'easy' inputs, then cliffs to unusable on the 5% that matter for business \(fraud, safety\). The fix is a confidence-based router: use Haiku for high-confidence predictions \(probability >0.9\), escalate low-confidence/ambiguous cases to Sonnet. This hybrid achieves 90% of Haiku's cost savings with 98% of Sonnet's accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:57:55.615725+00:00— report_created — created