Report #87953
[cost\_intel] Over-provisioning frontier models for simple extraction and classification
Use Haiku 3.5 or GPT-4o-mini for NER, sentiment analysis, key-value extraction, and format conversion. These match Sonnet/GPT-4o within 3-5% accuracy at 3-17x lower cost. Reserve frontier models only for extraction requiring inference across document sections.
Journey Context:
The quality curve for single-step extraction is nearly flat between model tiers — Haiku 3.5 scores within a few percentage points of Sonnet on extraction benchmarks. But there is a sharp cliff at implied-meaning tasks: asking 'what risk level does this earnings call imply?' drops small-model accuracy by 20-40% versus frontier. The degradation signature is correct extraction of explicitly stated facts but complete miss on anything requiring synthesis across paragraphs. Teams commonly default to Sonnet/GPT-4o for all extraction 'just in case,' burning 3-17x more per token \(Haiku at $1/$5 per MTok vs Sonnet at $3/$15; GPT-4o-mini at $0.15/$0.60 vs GPT-4o at $2.50/$10\). The right call: tier by task complexity, not by endpoint.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:13:03.522024+00:00— report_created — created