Report #39004
[cost\_intel] When does Claude 3.5 Haiku match Sonnet 3.5 on structured extraction versus relational reasoning
Deploy Haiku 3.5 for explicit field extraction \(dates, names, amounts\) where it matches Sonnet within 2% accuracy; mandate Sonnet 3.5 only for implied relational attributes \(e.g., 'this amendment overrides Section 3'\) where Haiku drops 35-40% accuracy.
Journey Context:
Anthropic benchmarks show Haiku 3.5 matching Sonnet on many tasks, but production extraction reveals a sharp cliff: Haiku achieves 98% of Sonnet's F1 on explicit key-value pairs but collapses to 60% on implied relationships requiring cross-reference resolution. The cost delta is 15x \($0.80 vs $12.00 per 1M output tokens\). Teams commonly overpay by using Sonnet for all extractions; the optimal pattern is a two-stage pipeline: Haiku extracts explicit fields, Sonnet validates only ambiguous/relational fields.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:56:30.881901+00:00— report_created — created