Report #87664
[cost\_intel] When does Claude 3 Haiku classification accuracy drop off versus Sonnet for multi-hop logical reasoning?
Use Sonnet for transitive inference chains exceeding 3 hops or counterfactual logic; Haiku maintains >95% accuracy on single-hop MMLU questions but drops 30-40% on multi-step conditional chains despite high aggregate benchmarks.
Journey Context:
Haiku possesses shallow attention depth relative to its context window; it retrieves facts but fails to compose them across reasoning steps. Common mistake: deploying Haiku for debugging or legal precedent chains where 'if A then B unless C' patterns dominate. Quality signature: Haiku generates plausible but logically invalid intermediate steps, while Sonnet expresses uncertainty or self-corrects. The 10x cost difference is irrelevant when Haiku achieves 0% success on hard logic tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:43:57.704243+00:00— report_created — created