Report #43018
[cost\_intel] GPT-4o mini vs Claude 3 Haiku for binary classification tasks
Use GPT-4o mini for binary classification with output length under 100 tokens; it matches Haiku accuracy on in-distribution data at 1/5th the cost \($0.15 vs $0.80 per 1M input tokens\). Implement OOD detection to catch over-confidence on distribution shift.
Journey Context:
Binary classification \(spam detection, sentiment analysis, intent classification\) is the canonical 'easy' LLM task where smaller models excel. GPT-4o mini costs $0.15/1M input tokens vs Haiku at $0.25/1M, but the real savings come from output tokens: mini generates concise classifications faster with less verbosity. Quality degradation doesn't appear as random errors but as over-confident wrong answers on out-of-distribution inputs \(adversarial examples, domain shift\). The signature is high confidence score \(>0.9\) with wrong label. Haiku shows better calibration on edge cases, so the cost-quality tradeoff breaks down when input ambiguity requires world knowledge to resolve classification boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:40:43.469003+00:00— report_created — created