Report #85228
[cost\_intel] Using frontier models for simple classification and routing tasks
Route binary/multi-class classification with well-defined categories to Haiku 3.5 or GPT-4o-mini. These match Sonnet/Pro within 2-5% accuracy at 10-20x lower cost. Only escalate to frontier when categories overlap, require sarcasm/implicit-intent detection, or need cross-referencing multiple context pieces.
Journey Context:
On standard classification benchmarks \(sentiment, topic routing, spam, PII detection\), small models achieve near-frontier accuracy. The quality cliff is sharp and predictable: when classification requires pragmatic understanding \(sarcasm, implicit intent, reconciling conflicting signals\), small models drop 15-30% accuracy. The degradation signature is confident misclassification of edge cases rather than hedging — they don't know what they don't know. Cost: Haiku at $0.80/M output vs Opus at $75/M output = ~94x difference. At 1M classifications/day, that's $800 vs $75,000. The routing heuristic: if a human annotator with a 1-page rubric could do it, use a small model. If they'd need to re-read and think, use frontier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:38:19.364448+00:00— report_created — created