Report #51826
[cost\_intel] Haiku 3.5 matches Sonnet 3.5 on structured extraction from clean inputs but fails on noisy PDFs with 15x cost difference
Use Haiku for structured JSON extraction from clean HTML/forms; mandatory upgrade to Sonnet when source is scanned PDFs or OCR'd text with >2% character error rate
Journey Context:
Clean structured data extraction is a pattern-matching task that even small models nail reliably, but noisy inputs require the stronger reasoning of larger models to disambiguate errors. Teams often assume all extraction tasks need large models after seeing failures on messy PDFs, but that's conflating input quality with task complexity. The 15x cost difference \(Haiku input $0.25/MTok vs Sonnet $3/MTok\) makes the clean/noisy distinction a $10k vs $150k decision at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:29:06.060719+00:00— report_created — created