Report #81922
[cost\_intel] Haiku 3.5 vs Sonnet 3.5 quality cliff on implicit cross-sentence coreference
Use Claude 3.5 Haiku for explicit key-value extraction \(names, dates, amounts\) in documents <50k tokens; switch to Sonnet 3.5 only for tasks requiring cross-sentence coreference, implicit causal reasoning, or legal entailment. Haiku matches Sonnet within 3% F1 on structured extraction but fails >15% on implicit reasoning benchmarks.
Journey Context:
Anthropic's evaluation shows Haiku 3.5 achieves near-parity with Sonnet on SWDE and other structured extraction benchmarks at 1/10th the cost, but exhibits a sharp accuracy cliff on multi-hop reasoning tasks requiring implicit information synthesis. Teams commonly over-provision Sonnet for simple invoice or contract field extraction, wasting budget. The break-even is explicit vs. implicit reasoning: if the answer is literally present in the text \(explicit\), Haiku suffices; if it requires connecting disparate mentions \(implicit\), Sonnet is required to avoid expensive error-correction loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:06:09.920370+00:00— report_created — created