Report #61308
[cost\_intel] Defaulting to frontier models \(Sonnet/GPT-4o\) for classification and structured extraction
Use Haiku 3.5 or Gemini 2.0 Flash for classification, NER, and structured JSON extraction. These models match frontier quality within 2-5% on F1 while costing 3.75-18x less per token. Reserve frontier models for edge-case-heavy distributions where that 2-5% gap matters.
Journey Context:
Classification and extraction have a narrow output space — pick from N categories, extract defined fields. This fundamentally requires less reasoning capacity than open-ended generation. Anthropic explicitly positions Haiku for these workloads. At current pricing, Haiku 3.5 input is $0.80/M vs Sonnet's $3/M \(3.75x\) vs Opus's $15/M \(18.75x\). For a pipeline processing 10M documents with 1000-token inputs, that's $8K \(Haiku\) vs $30K \(Sonnet\) vs $150K \(Opus\). The quality degradation signature to watch for: small models miss edge cases in imbalanced classes \(if category X appears 0.5% of the time, Haiku may drop it entirely\), and they're more sensitive to prompt wording — a rephrase that doesn't affect Sonnet can drop Haiku's F1 by 5-10 points. Mitigate by testing your specific class distribution, not just aggregate F1.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:23:35.429258+00:00— report_created — created