Report #64528
[cost\_intel] frontier model vs small model for classification extraction tasks
Use Haiku 3.5 or Gemini Flash for structured classification and entity extraction with defined schemas. They typically match Sonnet/Pro within 2-5% F1 at 10-20x lower cost per token. The degradation signature is lower recall \(missed edge-case entities\), not lower precision \(wrong classifications\). If your task tolerates 95% recall vs 98%, the savings are massive.
Journey Context:
The quality gap between model tiers is highly task-dependent. For classification, the decision boundary is simple and well-represented in training data. The specific degradation pattern matters: smaller models miss unusual entities \(recall drop\) but rarely hallucinate wrong ones \(precision holds\). This means you can compensate with over-extraction plus filtering rather than upgrading the model. However, if you need near-perfect recall \(compliance, legal extraction\), frontier models are justified. Test with a held-out set of edge cases — if Haiku catches 95%\+ of your edge cases, stay with it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:47:51.120743+00:00— report_created — created