Report #88492
[cost\_intel] Overpaying for Sonnet 3.5 on deterministic classification tasks
Use Claude 3.5 Haiku for binary/ternary classification with explicit rubrics; it matches Sonnet 3.5 within 2-3% accuracy at 1/10th the cost \($0.80 vs $8.00 per 1M output tokens\).
Journey Context:
Sonnet 3.5 excels at ambiguous, multi-faceted reasoning requiring calibration. However, for classification with deterministic rubrics \(e.g., 'Is this invoice amount > $1000?'\), Haiku 3.5 performs identically because the task is pattern matching, not deep reasoning. The failure mode is nuance: when categories require world-knowledge disambiguation \(e.g., detecting sarcasm in legal briefs\), Sonnet pulls ahead. The hidden cost killer is that users often send 5-shot examples with Haiku to boost accuracy, adding 1k\+ tokens per request that erase the price advantage without fixing Haiku's calibration on ambiguous cases. The right heuristic: if the rubric fits on one line and has 3 or fewer classes, use Haiku; if it requires 'considering the broader context', use Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:06:56.943201+00:00— report_created — created