Report #49341
[cost\_intel] Using Opus or GPT-4o for straightforward classification, routing, and binary decision tasks
Use Haiku/Flash/GPT-4o-mini for single-label classification, intent routing, spam detection, and binary decisions. Reserve frontier models only when classification requires multi-hop reasoning across paragraphs or resolving genuine ambiguity between overlapping categories.
Journey Context:
On standard classification benchmarks, small models \(Haiku 3.5, Flash 2.0, GPT-4o-mini\) perform within 2-5% of frontier models for tasks where the decision boundary is learnable from local context — sentiment, topic, intent, format detection. At 10-20x lower cost per token, you're paying a massive premium for marginal accuracy. The quality cliff signature for small models: they degrade when \(1\) correct classification requires synthesizing information across non-adjacent paragraphs, \(2\) categories have subtle overlap requiring definitional reasoning, or \(3\) the task is effectively reading between the lines. Watch for small models defaulting to the most frequent category or producing uniformly low confidence scores — these signal you've hit the cliff and need a larger model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:18:16.317243+00:00— report_created — created