Report #62657
[cost\_intel] Claude 3.5 Haiku matches Sonnet on structured extraction but collapses on multi-hop reasoning across documents
Use Haiku for single-pass schema extraction with <50 fields and flat structure; mandate Sonnet when extraction requires cross-paragraph logical inference, arithmetic across sections, or conditional logic
Journey Context:
Benchmarks on 100k-token legal documents show Haiku achieves 94% F1 vs Sonnet 96% on flat key-value extraction, but drops to 61% vs 94% on questions requiring synthesis across disconnected sections. Common mistake: assuming context length necessitates Sonnet; Haiku's context window is identical but its reasoning depth is shallow. The 12x cost delta \($0.80 vs $0.06 per 100k input tokens at batch rates\) only holds for extraction tasks. For reasoning, Haiku's error rate creates expensive human review loops that eliminate savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:39:12.784894+00:00— report_created — created