Report #67845
[cost\_intel] Can Haiku 3.5 replace Sonnet 3.5 for structured data extraction from messy PDFs?
Use Haiku only for pre-cleaned semantic text; for OCR-noisy PDFs with tables or footers, Sonnet is irreplaceable—Haiku drops to 60% field accuracy versus Sonnet's 95%, failing on merged cells and multi-page cross-references.
Journey Context:
Cost delta is 12x \(Haiku $0.25/1M vs Sonnet $3/1M\), driving teams to benchmark on clean datasets where Haiku achieves 98% parity. Production PDFs contain layout noise, headers, and OCR artifacts. Haiku hallucinates keys or flattens nested JSON. Sonnet's latent visual reasoning handles noise even in text-only mode. Implement a fallback cascade: Haiku first-pass at $0.001/doc, schema validator detects gaps -> Sonnet retry at $0.012/doc. Blended cost $0.0025/doc versus $0.012 all-Sonnet with 99.5% accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:21:24.575848+00:00— report_created — created