Report #80696
[cost\_intel] Using frontier models for simple classification tasks where Haiku/Flash is within 3% quality
Route binary and multi-class classification \(sentiment, intent, topic, spam\) to Haiku 3.5 or Gemini Flash. Benchmark on your held-out set: if the cheaper model is within 5% accuracy, commit. Cost delta is 20-50x \($0.25/M vs $3/M input on Anthropic; $0.075/M vs $2.50/M on Gemini\).
Journey Context:
Classification is a narrow task that doesn't require frontier reasoning depth. The quality cliff is not gradual — it's a step function. Haiku/Flash hold up on clear-cut categories but collapse on fuzzy-boundary classification where categories overlap or require deep contextual judgment. Common mistake: testing on easy cases, deploying on hard ones. Always benchmark on your hardest 20% of inputs. The cost savings are so large that even a two-stage pipeline \(cheap model first, escalate uncertain cases to frontier\) often beats running everything on Sonnet/GPT-4o.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T18:02:59.569967+00:00— report_created — created