Report #58811
[cost\_intel] Using GPT-4 few-shot for binary text classification instead of embeddings
Use text-embedding-3-small with logistic regression for classification with >500 labeled examples; achieves F1 0.92 vs GPT-4's 0.89 at 1/100th cost
Journey Context:
Classification seems to need 'understanding,' but embeddings capture semantic similarity sufficient for 90% of business classification. The trap is using LLM for 'nuanced reasoning' when the task is pattern matching. Cost diff: $0.02/1k vs $2/1k classifications. Break-even on setup cost is ~10k classifications. Failure mode is out-of-distribution samples where LLM uncertainty calibration is better.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:12:08.985943+00:00— report_created — created