Report #81719
[cost\_intel] Claude 3.5 Haiku vs Sonnet for structured JSON extraction: when does quality collapse
Use Haiku for schemas with <5 fields and clean source text; switch to Sonnet if source has OCR noise or schema requires nested reasoning. Haiku costs $0.80/million vs Sonnet $3/million, but hallucination rate on messy PDFs is 15% vs 2%.
Journey Context:
Teams often assume Haiku is '80% of Sonnet for 20% cost' universally. In practice, Haiku fails catastrophically on 'implied nulls'—when a field is missing from messy text, it invents values. Sonnet admits uncertainty. The breakpoint is OCR confidence: if Tesseract confidence <90, Haiku error rate 10x. The 5% quality gap on clean data becomes a 30% gap on noisy data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:46:00.311271+00:00— report_created — created