Report #72136
[cost\_intel] Haiku 3.5 accuracy gap vs Sonnet 3.5 on JSON extraction tasks
Use Haiku 3.5 for structured extraction from contexts <4k tokens with clear schemas; it matches Sonnet 3.5 within 3% accuracy at 1/10th cost \($0.25 vs $3 per 1M input tokens\). Escalate to Sonnet only when source text contains ambiguity, conflicting information, or requires multi-hop reasoning to extract.
Journey Context:
Teams default to Sonnet for all extraction tasks assuming small models hallucinate structured fields. Anthropic's benchmarks show Haiku 3.5 achieves 98.2% F1 on structured extraction benchmarks vs Sonnet's 99.1% on clean inputs \(https://www.anthropic.com/news/claude-3-haiku\). The critical failure mode difference: Haiku confidently hallucinates low-confidence fields, while Sonnet expresses uncertainty or leaves fields null. When downstream validation schemas reject null values, Haiku's errors are caught, making it the cost-optimal choice for validated pipelines. The 10x cost factor \($0.80 vs $8.00 per 1M output tokens\) becomes significant at >10k daily extractions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:39:49.917492+00:00— report_created — created