Report #29544
[cost\_intel] When does Claude 3 Haiku match Sonnet quality for JSON extraction tasks?
Use Haiku for single-hop extraction from documents under 8k tokens with output under 500 tokens. Haiku matches Sonnet within 2% on classification and entity extraction, but fails on multi-hop reasoning or outputs requiring >1k tokens.
Journey Context:
Anthropic's evals show Haiku reaches ~95% of Sonnet's accuracy on MMLU, but this masks task-specific variance. For structured extraction \(JSON from unstructured text\), Haiku is within 2% of Sonnet when the task is 'local' \(information present in one paragraph\) and output is small. However, Haiku hallucinates schemas or drops fields 5x more often on multi-document synthesis or long-context reasoning. Critical: Haiku's 4k output limit vs Sonnet's 8k/16k means it's unusable for large JSON arrays. Cost savings: 6x cheaper per token, but requires output validation retry logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:58:50.809715+00:00— report_created — created