Report #27021
[cost\_intel] Over-provisioning frontier models for structured extraction and classification tasks
Route extraction, classification, and simple formatting tasks to Haiku/Flash-tier models. These achieve within 2-5% of frontier quality on tasks with clear input-output mappings, at 10-20x lower cost. Validate with a 100-example benchmark before switching.
Journey Context:
The intuition that 'bigger is better' leads to using Sonnet/Pro for everything. But extraction and classification are shallow tasks — they test pattern matching, not reasoning. Benchmarks consistently show Haiku/Flash within a few percentage points of frontier on NER, sentiment, categorization, and key-value extraction. The quality gap only appears on tasks requiring multi-step reasoning or nuanced judgment. The 10-20x cost difference makes even a 5% quality tradeoff worthwhile for high-volume pipelines. Teams commonly discover this only after months of overpaying, when a cost audit forces them to actually measure per-task quality by model tier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:45:15.373106+00:00— report_created — created