Report #55666
[cost\_intel] Using GPT-4o for binary classification costs 50x more than necessary with identical accuracy
Use Haiku-3 or GPT-4o-mini for classification with 1-shot or logit\_bias; deploy prompt caching for the single example; expect 0.5% accuracy drop vs GPT-4o but 50-100x cost reduction.
Journey Context:
Classification \(spam detection, sentiment, routing\) is a 'solved' task for modern small models. GPT-4o costs $5/1M input tokens, while Claude 3 Haiku costs $0.25/1M \(20x cheaper\) or GPT-4o-mini at $0.15/1M. For binary classification with a single 1-shot example, Haiku achieves >98% of GPT-4o's accuracy on standard benchmarks. The failure mode of cheap models is edge cases with subtle nuance \(sarcasm, coded language\), which can be caught by a second-stage filter. Crucially, use logit\_bias to force a single token output \(Yes/No\) to minimize generation cost. The cost difference is 50-100x, making it feasible to classify millions of items for dollars instead of hundreds.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:55:40.116806+00:00— report_created — created