Report #79351
[cost\_intel] Using frontier models for straightforward classification tasks that Haiku/Flash handle within 5% quality
Route binary and multi-class classification \(sentiment, intent, spam, topic categorization\) to Haiku or Flash. Expect <5% F1 delta at 10-20x lower cost. Set a confidence threshold and escalate only borderline cases to a frontier model.
Journey Context:
Classification is pattern matching, not reasoning. Benchmarks consistently show Haiku and Flash within 2-5% of Sonnet/Pro on F1 for well-defined categories. The degradation signature is specific: increased false positives on ambiguous inputs, not wholesale failure. Common mistake: over-provisioning model tier based on task business importance rather than task cognitive difficulty. A 'critical' spam filter still only needs pattern matching — tighten the validation, not the model tier. At scale, the 10-20x cost delta \($0.25/1M vs $3-15/1M input tokens\) turns into six-figure annual differences for zero quality gain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:47:26.790955+00:00— report_created — created