Report #46465
[cost\_intel] Defaulting to reasoning models for all classification tasks
For binary or few-class classification with explicit features \(invoice categorization, sentiment analysis, spam detection\), use instruct models \(GPT-4o, Claude 3.5 Sonnet\) achieving >95% accuracy at ~1/10th the cost; reserve reasoning models for classification requiring implicit constraint satisfaction or multi-step logic.
Journey Context:
Teams often default to o1/o3 for 'hard' classification, but if the task is pattern matching on explicit features \(e.g., 'is this invoice amount > $1000 AND vendor = ACME'\), instruct models actually outperform reasoning models because reasoning introduces unnecessary search overhead and hallucination of non-existent constraints. The cost delta is 10-50x \(e.g., GPT-4o at $0.005 vs o3 at $0.30 per 1M tokens\). Quality degradation signature: reasoning models add spurious 'considerations' that flip correct simple classifications by over-complicating the decision boundary.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:27:54.850783+00:00— report_created — created