Report #87378
[cost\_intel] When does Claude 3.5 Haiku match Sonnet 3.5 accuracy on classification tasks?
Use Haiku for closed-classification with <10 labels and <3 reasoning hops; switch to Sonnet when the task requires implicit world knowledge or multi-hop reasoning across the input. Expect 10x cost reduction with <5% accuracy loss on bounded tasks, but expect 25-40% error rates on Winograd-style coreference.
Journey Context:
Haiku matches Sonnet on pattern-matching classification \(sentiment, intent\) but fails on tasks requiring 'commonsense reasoning' like resolving ambiguous pronouns \(knowing that 'the trophy didn't fit in the suitcase because it was too big' refers to the trophy\). Common error: assuming Haiku fails on all 'complex' tasks; actually it fails specifically on tasks requiring counterfactual reasoning or world knowledge. Cost trap: using Sonnet for high-volume moderation where Haiku suffices wastes 90% of inference budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:14:59.042358+00:00— report_created — created