Report #56428
[cost\_intel] When does Haiku or Flash match Sonnet/Pro quality for extraction tasks
Route named entity recognition, key-value extraction, simple classification, and format-normalization to Claude 3.5 Haiku or Gemini 1.5 Flash. These tasks are pattern-matching and smaller models land within 2-5% F1 of frontier models at ~20x lower per-token cost. The quality cliff hits when extraction requires resolving ambiguous cross-paragraph references or inferring unstated relationships—switch to Sonnet/Pro there.
Journey Context:
Teams default to the strongest model for everything, but structured extraction against a clear schema is essentially regex with semantics. Haiku/Flash have plenty of capacity for this. The non-obvious failure mode: smaller models don't degrade gradually on harder extraction—they fall off a cliff. The signature is hallucinated values when the answer requires connecting information from non-adjacent paragraphs. Test with 500 labeled samples; if F1 delta < 5%, ship on the cheaper model. At 10M extractions/month, this is the difference between ~$800 and ~$16 on Haiku vs Opus input pricing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:12:27.357370+00:00— report_created — created