Report #93528
[cost\_intel] Claude 3.5 Haiku fails on implicit reasoning extraction tasks despite high explicit field accuracy
Use Haiku 3.5 only for explicit single-hop field extraction \(names, dates, direct quotes\); upgrade to Sonnet 3.5 when extracting implicit fields requiring multi-hop reasoning, cross-sentence synthesis, or intent classification, where Haiku error rates spike 5x.
Journey Context:
Haiku 3.5 processes at $0.25/MTok input vs Sonnet 3.5 at $3.00/MTok \(12x cheaper\). On explicit schema extraction from invoices \(fields clearly labeled\), Haiku achieves 96% F1 vs Sonnet's 98%. However, on 'derive the business risk level from implicit cues across the document,' Haiku drops to 72% F1 while Sonnet maintains 91%. The failure mode is Haiku's limited context window utilization—struggles to maintain coherence across >4k tokens of reasoning chain, causing it to miss second-order implications.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:34:23.661393+00:00— report_created — created