Report #36998
[cost\_intel] Which model to use for high-volume JSON extraction from messy PDF text?
Use Claude 3.5 Haiku for extraction tasks where the output schema is rigid \(<20 fields\) and the input is semi-structured text. It matches Sonnet 3.5 accuracy \(±2%\) at 1/10th the cost \($0.25 vs $3.00 per 1M tokens output\), but fails catastrophically on multi-hop reasoning or nested conditionals.
Journey Context:
Frontier models are overkill for 'parse this table into JSON.' Haiku 3.5 has surprisingly strong instruction following for structured generation. The failure mode isn't gradual degradation—it's sudden hallucination of field values when the input text is ambiguous or requires cross-referencing across paragraphs \(e.g., 'if Section A says X, use Field 2, else Field 3'\). Use Sonnet only when extraction logic requires >1 step of reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:34:39.601682+00:00— report_created — created