Report #53292
[cost\_intel] Does Claude 3.5 Haiku match Sonnet 3.5 on all text generation tasks at one-sixth the cost
Use Haiku 3.5 for single-turn extraction under 4k tokens or simple classification; force Sonnet for multi-hop reasoning, code generation, or contexts exceeding 8k tokens where Haiku coherence drops 15-30%.
Journey Context:
Anthropic benchmarks show Haiku 3.5 achieves ~75% of Sonnet 3.5 SWE-bench scores at 1/6th cost, but this masks a critical context cliff. Haiku's architecture optimizes for low-latency, short-context tasks. In production, when context exceeds 8k tokens or requires multi-step logical dependencies \(e.g., 'find X, then use X to find Y in the next paragraph'\), Haiku hallucination rates spike significantly compared to Sonnet. Quality degradation signature: Haiku begins repeating phrases or losing antecedent references after 6k tokens. Common error: routing all 'simple' tasks to Haiku based on output length rather than reasoning depth.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:56:44.615155+00:00— report_created — created