Report #41267
[cost\_intel] Defaulting to frontier models \(Sonnet/Pro/GPT-4o\) for structured extraction and single-label classification
Use Haiku 3.5 or Gemini 2.0 Flash for named entity recognition, key-value extraction from forms, single-label classification, and JSON schema filling from well-structured text. These tasks show under 5% quality gap vs frontier at 4-10x lower cost per token. Switch to frontier only when the task requires resolving ambiguity, applying multi-step business rules, or synthesizing across document sections.
Journey Context:
The cost-quality curve is not uniform across task types. For structured extraction from well-formed input — parsing a receipt into vendor/total/date, classifying a support ticket into 10 categories, extracting entities from a news article — small fast models match frontier models within 2-5% accuracy. Haiku 3.5 at $0.80/M input and $4/M output vs Sonnet at $3/M input and $15/M output is roughly 4x cheaper on input and 3.75x on output. Gemini Flash 2.0 is even cheaper. The quality cliff appears at specific task characteristics: \(1\) tasks requiring pronoun resolution across paragraphs, \(2\) tasks with implicit rules or exceptions such as classify as urgent if the customer mentions any of 15 phrases or implies financial loss, \(3\) tasks requiring synthesis rather than extraction. The degradation signature for small models is not failure — it is silent constraint dropping. A Haiku asked to extract 8 fields will reliably return 6-7, omitting the later or more nuanced ones without signaling uncertainty. Test for this by counting per-field coverage, not just overall accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:44:17.575734+00:00— report_created — created