Report #92962
[cost\_intel] Using Claude 3.5 Sonnet for all structured JSON extraction tasks assuming smaller models hallucinate schemas
Deploy Claude 3.5 Haiku for extraction tasks with Pydantic-constrained JSON outputs under 4k tokens; it matches Sonnet's F1 within 3% at 1/10th the cost \($0.80 vs $6.00 per 1M input tokens\)
Journey Context:
Engineers default to Sonnet after early Haiku 3 failures on complex reasoning, but Haiku 3.5's instruction following is sufficient for constrained extraction. The failure mode shifts from hallucination to schema violation, which is catchable via validation retries. Sonnet becomes necessary only for multi-hop reasoning across >10k token contexts or nested conditional extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:37:29.917853+00:00— report_created — created