Report #53127
[cost\_intel] Which Claude model to use for JSON extraction from unstructured text?
Use Haiku 3.5 for extraction tasks with <10 defined fields and clear schemas. Sonnet only becomes necessary when extraction requires multi-hop reasoning \(e.g., calculating derived values from multiple fields or inferring implicit relationships\). Haiku matches Sonnet on F1 within 0.02 for direct extraction at 1/8th cost.
Journey Context:
Engineers default to Sonnet for 'complex' tasks like JSON extraction. But extraction is pattern matching, not reasoning. Haiku 3.5's instruction following is sufficient for direct mapping. The failure mode for Haiku is reasoning-dependent extraction: when the value isn't explicitly in text but must be inferred \(e.g., 'sentiment' from subtle cues or calculating totals\). Benchmark your specific schema: if Pydantic validation passes >95% on Haiku, don't pay 8x for Sonnet. The quality degradation signature is 'hallucinated nulls'—Haiku returns null for fields that require 1-hop inference.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:40:13.870292+00:00— report_created — created