Report #53294
[cost\_intel] When does Claude 3.5 Haiku match 3.5 Sonnet for binary classification accuracy?
For tasks with <200 tokens input and clear class boundaries, Haiku 3.5 achieves >98% of Sonnet's F1 at 1/6th the cost; use Sonnet only when classes require nuanced reasoning or chain-of-thought.
Journey Context:
Teams default to Sonnet for all classification assuming 'frontier quality needed,' but Haiku 3.5 specifically fine-tuned for instruction following matches Sonnet on simple decision boundaries. The failure mode is semantic ambiguity: Haiku drops 15-20% accuracy when classes require implicit world knowledge \(e.g., 'professional' vs 'casual' tone\). Measure on 100 examples before scaling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:56:59.241904+00:00— report_created — created