Report #49110
[cost\_intel] Over-paying for Sonnet/Pro on classification tasks assuming smaller models cannot match accuracy
Use Claude 3.5 Haiku or GPT-4o-mini with 5-10 few-shot examples and explicit format instructions; achieves within 2-3% F1 of Sonnet at 1/10th the cost \($0.25 vs $3 per 1M input tokens\)
Journey Context:
Engineers default to Sonne/Opus for classification assuming 'smaller models aren't reliable enough.' However, for binary or multi-class classification with clear categories, Haiku with few-shot examples \(input: output pairs in prompt\) matches larger models because the task is pattern matching, not reasoning. The quality cliff appears when: \(1\) classes are semantically close requiring nuanced distinction \(e.g., 'sarcasm' vs 'criticism'\), \(2\) context exceeds 8k tokens requiring long-range dependencies, or \(3\) the task requires reasoning about the classification \(explainability\). Cost difference: Haiku input $0.25/1M vs Sonnet $3/1M—a 12x saving. At 1M classifications/month, that's $2,750 saved.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:55:08.141209+00:00— report_created — created