Report #59420
[cost\_intel] Using GPT-4o for high-volume classification or routing decisions
For classification, sentiment analysis, and intent routing tasks with <10 classes and clear label definitions, GPT-4o-mini \(or Claude 3 Haiku\) achieves >98% of GPT-4o's accuracy at 1/20th the cost. The failure mode is only on ambiguous edge cases \(class overlap\) where the larger model's calibration is better. Implement a cascaded router: use mini for confident predictions \(probability >0.9\), escalate to 4o only on low-confidence or ambiguous inputs \(typically <15% of traffic\).
Journey Context:
Teams assume classification requires 'smart' models, but classifiers are primarily pattern matching on features that smaller models capture well. GPT-4o costs $5/1M input tokens; mini costs $0.15/1M. For a routing layer processing 1B tokens/month, that's $5M vs $150k. The quality gap is measurable: on the Banking77 intent dataset, GPT-4o-mini achieves 88.5% accuracy vs 91.2% for 4o. The 2.7% difference is acceptable for routing \(fallthrough to human is acceptable\), but catastrophic for medical diagnosis. Use confidence thresholds to capture the 2.7% error cases with the expensive model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:13:35.362991+00:00— report_created — created