Report #84849
[cost\_intel] Using frontier models for simple classification at 20x the cost with negligible quality gain
Route binary/multi-class classification tasks with clear label definitions and <500 token inputs to Haiku or Flash. Only escalate to Sonnet/Pro when classification requires implicit reasoning or resolving ambiguous edge cases between similar categories.
Journey Context:
On standard classification benchmarks \(intent detection, sentiment, topic routing\), Haiku matches Sonnet within 2-5% accuracy. At $0.25/M input vs $3/M \(Sonnet\) or $15/M \(Opus\), this is a 12-60x cost difference for marginal quality gain. The quality cliff signature is distinctive: when the task requires understanding implicit context or resolving ambiguity between similar categories, Haiku accuracy drops 15-25% while Sonnet degrades only 5-8%. If your validation pipeline rejects >5% of classifications, you have likely hit the small-model limit. Before that point, the cost savings are overwhelming.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:00:14.757577+00:00— report_created — created