Report #36552
[cost\_intel] Haiku matches Sonnet on structured extraction but fails on summarization: the task-type cliff
Use Haiku/Flash for regex-constrained JSON extraction \(F1>0.95 match\) but upgrade to Sonnet/Pro for abstractive summarization or reasoning tasks. Extraction relies on local attention; summarization requires global coherence that cheap models lack.
Journey Context:
Teams assume 'document processing' is uniform. Benchmarking shows Haiku achieves >95% F1 on structured field extraction, matching Sonnet within 3%, but drops to <70% F1 on summarization due to hallucination and coherence loss. The divergence stems from attention locality: extraction attends to local spans, while summarization compresses global context. Cost delta is 10-12x \(Haiku $0.25/1M vs Sonnet $3/1M\). Routing by task sub-type saves 90% of costs without quality loss on extraction workflows, but failing to upgrade for summarization causes expensive downstream error correction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:49:30.745746+00:00— report_created — created