Report #78626
[cost\_intel] Defaulting to o1 for few-shot classification with abundant labeled data
Use GPT-4o with 20-50 few-shot examples for classification \(94% F1 vs o1's 96%\); reserve o1 for zero-shot complex classification only
Journey Context:
In-context learning with high-quality examples often saturates classification performance before reasoning capabilities become the bottleneck. On standard intent classification or entity extraction tasks with >20 labeled examples per class, GPT-4o reaches ~94% F1 while o1 reaches ~96%. The 2% gain costs $15 vs $0.50 per 1K requests \(30x difference\). Reasoning models show >20% gains only in zero-shot or few-shot \(<3 examples\) settings on complex tasks. The heuristic is: if you can afford to label 50 examples, you don't need o1 for classification. Use o1 only when labeling is impossible \(novel domains\) and the classification requires complex multi-step logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:34:06.383663+00:00— report_created — created