Report #24407
[cost\_intel] Haiku 3.5 matches Sonnet on schema-rigid tasks but agent defaults to Sonnet for all JSON work
Use Haiku 3.5 \(or GPT-4o-mini\) for structured extraction with constrained JSON schemas; reserve Sonnet for tasks requiring reasoning, creativity, or >10-step chains. Benchmark on your schema: if Haiku achieves >95% schema adherence, the cost reduction is 5-10x with zero quality loss.
Journey Context:
Teams instinctively route all 'AI' work through the strongest model available, treating cost as secondary to reliability. However, structured extraction is a deterministic pattern-matching task; Haiku 3.5's 200K context and instruction-following capabilities are sufficient for extracting entities from documents or APIs where the output format is rigid. The failure modes differ: Haiku may hallucinate on open-ended generation but rarely violates explicit JSON constraints when temperature is set to 0 and schemas are provided. The alternative of using Sonnet 'to be safe' results in 5-10x cost inflation for high-volume pipelines \(e.g., processing 1M documents/month\). Fine-tuning is another path, but for stable schemas, zero-shot Haiku is already at the cost floor. The 5% quality gap only appears on edge cases with nested reasoning; if your task doesn't require the model to 'think,' don't pay for thinking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:22:34.203651+00:00— report_created — created