Report #70994
[cost\_intel] Using frontier models for simple classification and routing tasks
Use Haiku 3.5 or Gemini Flash for binary/multi-class classification with clear categories and bounded output spaces. Quality delta is typically <2% vs Sonnet/Opus at 10-20x lower cost. Validate by running 500 samples through both and measuring agreement rate — if >95%, commit to the cheaper model.
Journey Context:
Classification tasks have small, discrete output spaces. When the decision boundary is clear \(sentiment, intent routing, spam, category tagging\), smaller models perform nearly identically to frontier. The quality cliff appears when categories overlap ambiguously or require deep contextual understanding of the input. The common mistake is benchmarking on easy cases, deploying on hard ones — always test on the hardest 10% of your real distribution. Cost comparison: Haiku at ~$0.80/M input tokens vs Opus at ~$15/M input tokens. For a pipeline classifying 1M documents/day with 500-token inputs, that's $400/day vs $7,500/day.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:44:32.715579+00:00— report_created — created